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Examiner: Audet, Maury A. 



§ 371 Date: January 23, 2002 



Atty. Docket: 1581.0870000/TJS/LDB 



For: Use of a Lectin or Conjugates for 
Modulation of C-Fibre Activity 



Declaration Under 37 CFJL § 1.132 of Dr. John Chaddock 



Assistant Commissioner for Patents 
Washington, D.C. 20231 



I, the undersigned, Dr. John Chaddock, do hereby solemnly and sincerely declare that: 

1. I am a named inventor of the subject matter described and claimed in U.S. Patent 
Application No. 09/937 t 484. 

2. I am currently Head of Molecular Biology at Syntaxin, Ltd. in the United Kingdom. As 
evidenced by the abridged version of my curriculum vitae (attached), I have been actively 
undertaking research in the field of lectin conjugates for the last 1 8 years. I am an expert in the 
field of modulating C-fibre neuron activity for therapeutic purposes. 

3. I am familiar with the Office Action dated 29 November 2005 for the above-captioned case. 



Sir: 
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4. Lectin compounds are a well-recognized group of structurally and functionally related 
molecules that share common structural features. As a first structural feature, lectins possess a 
highly conserved binding site triad of amino acids, which is known as the "Asp-Gly-Asn triad 11 . 
The conserved Asp-Gly-Asn triad is disclosed, for example, in Svensson et al J, Mol Biol 32 1: 
69-83 (2002) (enclosed herein as Exhibit A), and in Loris et al / Mol Biol 335: 1227-1240 
(2004) (enclosed herein as Exhibit B). 

5. As a second structural feature, the lectins possess a "lectin fold' 1 , which consists primarily of 
three /J-sheet$: 

- a "flat" six-membered "back" j3-sheet; 

- a small "top" jS-sheet; and 

- a curved, seven-stranded "front" j3-sheet. 

The "lectin fold" is described in detail in Chandra et al Prot Engin. 14: 857-866 (2001) 
(enclosed herein as Exhibit C), and in Turton et al Glycobiology 14: 923-929 (2004) (enclosed 
herein as Exhibit D). The "lectin fold" facilitates presentation of a substrate (sugar) to the lectin 
in a concave region of the "front" /S-sheet. The highly conserved amino acid triad consisting of 
Asp, Asn and Gly is found within the "lectin fold." 

6. The primary sequence of Erythrina cristagalli lectin (ECL) is disclosed in Figure 2 of 
Exhibit A. Alignment of the primary sequence of ECL with the sequence of the Erythrina 
corallodendron lectin (ECorL) in the same Figure 2 of Exhibit A shows that ECL shares 96% 
sequence homology with ECorL, and reveals 100% identity in the conserved Asp-Gly-Asn triad: 
Asp89, Glyl07 and Asn 133. 

7. The primary sequence of the first 244 amino acids of the Erythrina corallodendron lectin 
(ECorL) was first disclosed in Adar et al FEBS Lett. 257: 81-85 (1989) (enclosed herein as 
Exhibit E). Comparison of the amino acid sequence of ECorL with that of other legume lectins 
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reveals a high degree of homology overall and 100% identity in the conserved Asp-Gly-Asn 
triad. See Figure 3 in Exhibit E. This highly conserved lectin binding site "Asp-Gly-Asn triad" 
is well known to those skilled in the art. 

8. It is understood by researchers in the field of lectins that the Asp-Gly-Asn triad and the 
lectin fold are highly conserved distinguishing features of all lectins. 

In this regard, Loris R. et al, Proteins, 1994 Dec; 20(4);330-346 (enclosed herein as 
Exhibit F) describes monosaccharide recognition by two forms of lentil lectin, via a highly 
conserved triad of residues, namely Asp 81, Gly 99 and Asn 125. Thomas, C.L and Surolia A., 
Biochem. Biophys. Res. Commun 2000 Feb 16; 268 (enclosed herein as Exhibit G) confirms that 
an invariant triad of residues Asp 87, Gly 105 and Asn 137 coordinates recognition of L-fucose 
by fucose-binding legume lectins. These references confirm that the ''Asp-Gly-Asn triad" is 
essential for ligand recognition in a variety of lectins whose crystal structures are known. 

Rao VS. et al, J. Biomol Struct. Dyn 1998 Apr 15(5):853-860 (enclosed herein as 
Exhibit H) confirms, with reference to soybean agglutinin, that the invariant 11 Asp-Gly-Asn triad" 
is essential for binding carbohydrate. Together with an aromatic residue (Phe or Tyr), these 
three invariant residues provide the basic frame for the sugar to bind. This is confirmed by Rao 
VS. et al> Int. /. Macromoi 1998 Nov; 23(4): 295-307 (enclosed herein as Exhibit I), which 
reviews the sugar binding sites of Erythrina corallodendron (EcorL), peanut lectin (PNA), 
Lathyrus ochrus (LOLI) and pea lectin (PSL) and confirms that the invariant residue Asp (from 
loop A), the invariant residue Asn (from loop C) and the invariant residue Gly (from loop B) are 
required for a tight interaction with the sugar molecule. 

8. I further state that all statements made on my own knowledge are true and that all 
statements made on information and belief are believed to be true and further that willful false 
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statements and the like are punishable by fine or imprisonment or both, under Section 1001 of 
Title 18 of the U.S. Code and that such willful false statements may jeopardize the validity of the 
application or any patent issuing thereupon. 



Date: W%>/ 70t)6 

if JohnChaddock,Ph.D. 




515090. 1 
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NOTE: The Biographical Sketch may not exceed four pages. Items A and B (together) may not exceed two of 
the four-page limit Follow the formats and instructions on the attached sample. 

A. Positions and Honors. List in chronological order previous positions, concluding with your present position. List 
any honors. Include present membership on any Federal Government public advisory committee. 

Positions and Employment 

1992-1995 Postdoctoral Research, University of Warwick, UK 
1995-1996 Postdoctoral Research, University of Warwick, UK 
1996*2001 Scientist, Centre for Applied Microbiology & Research, Salisbury, UK 
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High-resolution Crystal Structures of Erythrina 
crlstagalll Lectin in Complex with Lactose and 
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The primary sequence of Erythrina cristagalli lectin (ECL) was mapped by 
mass spectrometry/ and the crystal structures of the lectin in complex 
with lactose and ^-a-L-ruoosyllactose were determined at 1.6 A and 1-7 A 
resolution, respectively. The two complexes were compared with the 
crystal structure of the closely related Erythrina coraUodendron lectin 
(ECorL) in complex with lactose, with the crystal structure of the Ulex 
euTopaeus lectin n in complex with ^-fc-L-furosylkctose, and with two 
modeled complexes of ECorL with ^^-L-fucc^l-N^u^tyllactosarnine. 
The molecular models are very similar to the crystal structure of ECL in 
complex with ^-a-urucoeyllactose with respect to the overall mode of 
binding/ with the L-fucose fitting snugly into the cavity surrounded by 
iyrlQ6, Tyrl08, Trpl35 and Prol34 adjoining the primary coiriHning site 
of the lectin. Marked differences were however noted between the models 
and the experimental structure in the network of hydrogen bonds and 
hydrophobic interactions holding the L-fucose in the combining site of 
the lectin, pointing to limitations of the modeling approach. In addition 
to the structural characterization of the ECL complexes, an effort was 
undertaken to correlate the structural data with thermodynamic data 
obtained from rrticrc^alorirnetry, revealing the importance of the water 
network in the lectin combining site for carbohydrate binding. 

C 2002 Elsevier Science Ltd. All rights reserved 

Keyword* crystal structure; gjycobiology; lectin; protein-carbohydrate 
interactions; structure /hinction 



Introduction 

Specific recognition of carbohydrates by proteins 
lies at the heart of many biological processes, ran- 
ging from cell -cell interaction and adhesion of 
infectious agents to host cells, cellular signaling 

Abbreviations used: ECL, Erythrina cristagaUi lectin; 
BCorU Erythrina coraUondendrvn lectin; ELLA/ ertzyme- 
linked lectin assay: Fuc, L-fucose rucosyllactose, fucLac, 
2*Hi-L-fua^yUactoBe; fucoeyi-N-WtyllactoMmlne, 7-a- 
L-fucosyi-^a^tyUactosainixie; Gal, galactose; Glc, 
glucose; rra isothermal titration caloximetry; Uc, 
lactose; MFD, frnethyl-lyi-f^tanecliol; MS, mass 
spectr o metry; nns, root mean square; PBS, photSphate- 
buHered saline; UBA-U, titer mrepwu lectin II WBA-I 
and H, winged bean acidic lectin I and II* 

Email address of the corresponding author 
ute-krerigd^c^ntrch^halmers^g 



and differentiation, malignancy and metastasis, to 
fertilization and immune response. u Understand- 
ing the roles of carbohydrates in these processes 
and how they interact with proteins is expected to 
have a large impact on the development of new 
treatments against many human diseases. 

Legume lectins are especially well suited as a 
model system to study the molecular basis of 
protein-carbohydrate recognition because they are 
structurally similar and yet their specificity is 
diverse. By now, the thieeKfiznensfanal structures 
of 20 legume lectins have been solved by high-resol- 
ution X-ray crystallography, both in free form and in 
complex with a variety of carbohydrate ligandst. 



t See 3D Lectin Data Bank on Vferid Wide Web URL: 
http:/ /www.cerrna v^rsJr/databardc/lectlne 



0022-2836/02/$ - sea front nutter 9 VJ02 Elsevier Sdanca Ltd, AU right* reserved 
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&ys& Structures of ECL Carbohydrate Complexes 71 

ft! & P*__ fc 

X VOTXSFBFSE PCPONnHLTL QGAALITQSG VKQLTKINQN OMPAWD8TGR TLYAKPVHIW BCorL 
'VKEISPSFSK PBPaJWJLTL Q3AALITQ5G VLQLTKINQN GMPAWDSTGR TLYTKFVHOT «GL-atB (FT) 
VETISF6FSE FBEGWDWLTL QOAALITQSG VLQLTKIKQM 5MPAWDSTGR TLYTKFVHTW BCL-MS (Of) 
VETISFSF&fi FBPGMDNLTL QGAALITQSd VXjQLTKINQN GMPAMDSTGR TLYTKPVHHH Ka-Xr*Y 

04 l-oop A &5 Loop a 

STRFSPSISO PTTRPLPADG LVFFMGPTK6 KFAQGYGYLG IFHN SKQDtt S SO«. 
UMTTOTVaflff ETRFSFS1HQ PYTRfLPADG LVFFMQPTKS KWJCVOXLO VPNNfiXQDSS ECI.-BB (FT) 

ranroansKSF htrfsfsihq fytrwws lvffmgftks kpaqqyqym vfnkskodns Bet-its <ao 

DaTTGTVASF ETRtfSE'SXBQ PVTRPLPJ^G LVFFMQPTKfl KPAQGYGYLG VFNN3KQ0N6 BCVXray 



121 tfttdfflVBEWE FSjJPttPPPQV PHfolBVHSI RfilKTQFFQti DMOQVAHWI tafDAflSKILa BCkwjL 

"yQHAVFPDT FSHIWPPQV PHIGIDVUSI BfllKTQPFOL DNQQVASWI KVnAfl^KIU* BCL-KS 

YOTIJIVBFOT FSMPWDPPgV FHIGIDVNSI RSIKTQPFQL ONGOVANWI KTOftPSKlL- (CM) 

YOTIiAVBEDT FSgPWDPPQV FHIGIDVNSI RBIKTQPPQL DNGQVANWI KYDAPSKILB KX-Xfcay 

810 011 01 012 lAop d pia 



l&l.AYItfYFSSGA IVTIAEIVDV KQVLPBWVDV GLSGATGAQR DAABTHDVY6 ftlSFQASLPB xCorL 

AVLVYPSNGA IYTIAEIVDV KQVLFSWVOV GLSGATOAQR DAASTHDVYS WSFKASLPE RO.-MS {FT) 

-VLVXPStfOA IVTIAEIVDV KQVLPDWVDV GLSGATOAQR --- BCL-HS (CM) 

WLVtFfiSGA IYTTABIVDV KQVLPDWVDV GLSGATGAgR DAAETHDVYS WSPftASLPE SGL-toy 

Figur* 2. AitaunePtof th* peptide sequent of ECL and ECorL (SWES-PRGT entry F164WX the ECL*eaacnce 
was taw«Lboth by mass ppectaomfttty (CM* present study; P. Thibaull, personal commtinkation, on ECL from a 
different source) and front cryttaUographic election density maps- Secondary structural elements and important Inter- 
action partners of the lactose ligand "(fte Aspd9^31yl07-A«nl33 triad and Ak21fi-Cbi219, underlined) are Highlighted, 
Differences between tte four sequences are indicated in bold. 

One of the most ftoioughiy studied of these Is the hydrate-binding assays and rnJcrocalorimefry to 
Gal/GalNAc specific EryOoina comllodendron lectin structural data from X-ray crystallography and 
(ECorU, the best natural ligand of which is the H-2 primary sequence analysis m order to characterize 
blood type determinant frx^-N^aceryilactos- the underlying protein- carbohydrate interactions 
amine-* The crystal structures of ECorL and of its of this system, both in structural and ftenno- 
complexes with several mono- and disaccharides/ dynamic terms, 
notably lactose and N-^a<styiLacto$amine, have been 
determined, 7 '' and models have been presented of 

SE^^^^SSf Results and Dfce.iss.on 
i^-^uSlr^- ^ Carbohydrate b.nd1n fl S tud.e, 

and rhermodyruumc data is currency under The lectins of the Brythrirta family were initially 
discussion.**""" characterized as specific for galactose and N-acetyU 

In the present investigation, we have chosen to gabctosamine, with a pronounced preference tor 
study several highly homologous legume lectins N-acetyllactosamlne. 1 * However, binding to solid 
from Che Erythrtaa family as a model system for phase-immobilized glycosphingolipids,^ micro- 
protein-caiDcaHydrate interactions. Particular calorimetry," and enzyme-Linked lectin assay 
emphasis is on ECU 17 which, despite its high (ELLA) 9 demonstrated that ECL and ECorL inter- 
homology, exhibits subtle, but distinct di&rences act more strongly with fucosyUactose andfucosyl- 
in carbohydrate banding properties compared to N-acetyUactosamine- Subsequcruly, a difference in 
the other members of tros family. The aim of the the carbohydrate binding preferences of the two 
current study was to relate the results from carbo- lectins was demonstrated. While ECL bound to 



Figure 1. Binding of ^[-labeled flrythrina lectins to serial dilutions of glycOSphtogollpids in microliter wells. The 
assay was performed aa described In Materiala and Methods. Data are presented aa mean values of triplicate daterml- 
natkms, after subtraction of the (low) background values. 
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Crystal Structures at 5CL Carbohydrate Complexes 



TttOe 1- Data collection and refinement statistics _ 


Dttaset 


ECL-Uc 


ECL-PucLac 


PDB code 


IGZC 


1CZ9 


Unit oflU 

ah 

Space group. 

Number of observed nAtCttons 
Number of urriqua leflBCtfoW 

Compiumi^nff (%) 
IN 

Redundancy 

Rmsdbctid length* (A) 

R*al space wrfeteHon cocfBd«l» (%) 


126.1 

P4*2 

U 0^9-1-64) 
551,906 (38,515) 
56,614(6066) 
S4(60B) 
99 J (99.4) 
20.9(13) 
94(6-0 
204(334) 
216(34.3) 
0.018 
1.90 


60l9 
1259 

1J7 (1.7D-1J0) 
429,936 (59 733) 
46349(7167) 
67(693) 
993(995) 
17.5 (2.6) 
9JC53) 
20,8(8*8) 
2*9(316) 
0,014 
1.96 
91,6 


Eam^undiBtt profile* 
NfotfevorihleC*) 
Addrtiofnlly illowed 

Disallow td 


m 

11.3 
03 
00 


m 

113 
03 
00 
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N-acetylUctasairune- and fucosyl-N-acetyllactos- 
arnir^ terminated compounds with similar 
affinity, 19 the latter compounds were the preferen- 
tial ligands for EcorU 6 However, no difference in 
binding affinities of the two lectins was observed 
to complex type oligosaccharides and synthetic 
cluster glycosides, which do not contain a fucose 
residue.** Qycosphingolipid binding experiments 
with other members of the Erythrina lectin family 
(E. rubrinerva, £. v&pertQio. E- lysistemon, B. caffra, 
and E. fkbemjfrmis; Figure l(c)-{g)) demonstrated 
that their carbohydrate binding preferences were 
similar to ECorL (Figure 1(b)), Le- they preferen- 
tially bound to the fucosyl-N^ceiyDactosanune- 
terminated compound. 

The finding that the carbohydrate binding 
properties of ECL (figure 1(a)) differ from those of 
the Other members or the Brythrina lectin family/ 
despite their high homology/ prompted us to 
attempt the crys tallographic analysis of ECU both 
in complex with lactose and rucosyUactose, in 
order to structurally characterize the underlying 
protem-ttrbohydrate interactions of this system. 

Primary sequence 

The EGL sequence was obtained by mass spec- 
trometry: peptide mapping 11 " 1 * and tandem mass 
spectrometry 24 of the eruymatically digested 
protein purchased from Vector Laboratories. The 
sequence determination was aided by the available 
sequence of a different ECL isoform (P. Thibault, 
personal communication). Sequence coverage of 
713 of 239 amino add residues, or 89%, was 



obtained. Comparison with the electron density 
obtained from the crystallography analysis 
showed excellent agreement (Figure 2), except fox 
residues 59 and 62. Residue 59 is a methionine in 
the X-ray structure and an isoleucine, or the iso- 
baric leucine according to the mass spectrometry 
data, while residue 62 was modeled as a serine in 
the X-ray structure, but is a metruonine according 
to mass spectrometry results. The difference at the 
former residue most likely reflects the presence or 
dorninance of different isoforms of ECL in the 
crystal and in solution/ respectively, while for the 
latter difference, it cannot be excluded that it 
represents an artifact caused by crystal disorder. 
OveralL the ECL primary sequence is highly simi- 
lar to that of ECorL, with a sequence identity of 
approximately 97%. The combining site residues 
(£ui$ all residues that interact either directly with 
the carbohydrate ligands or indirectly via a water 
molecule) exhibit 100% sequence identity. 



Structural analysis 

Accuracy of crystal structures 

The crystal structures of the ECL complexes with 
lactose and rucosyllactose have been detennined at 
1,6 A and 1.7 A, respectively/ based on high quality, 
Strong and highly redundant data. At this high res- 
olution it is possible to interpret many details in 
the electron density maps, including the positions 
of water molecules and the existence of several 
alternative conformations for protein side-chains 
or hydroxyl groups of the carbohydrate Uganda. 
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Figure 3. Electron density for lactose (a) and fucosyilactoee (b), in the respective ECL complexes. 2F a - F c simulaied- 
wme^lirtg omit maps covering the saccharidee, displayed at la. 



The final X-fectors for the ECL structures are 
208% (R^ = 2Z6%) and 20-8% (iU. - 23,9%) for 
the lactose and fucosyllactose complexes, 
respectively: Rms deviations from idea) geometry 
are 0.018 A/0.014 A and l.W/1.96r for the 
respective bond lengths and angles of the two 
complexes. Finally, the pamachandran plots 25 are 
consistent with geornetrically well-denned struc- 
tures (table 1). AU of these values reflect (he high 

rlity of the structural models. At the same time, 
structures of the ECL complexes an* well defined 
by electron density for essentially all 239 amino add 
residues identified by mass spectrometry as well as 



for the carbohydrate ligand (Figure 3). The only 
slightly weaker points of the structure are at me 
2-fold symmetry axis around HislBO. The good 
agreement between the density and the structural 
model is also reflected in the average real-space 
correlation coefficients of 92.4% and 91.6%, respect- 
ively, for simulated annealing omit maps of the 
lactose and fucosyllactose complexes- 

ECL structure 

Tertiary and quaternary structures. Asm the case of 
ECorL and other legume lectins, the ECL fold 




loopB 



Figure *- Tbpnlogy of ECL (com' 
plex with lactose), ECL residues 
that differ from ECorL are shown 
explicit* In addition Va! 92, the 
interaction partner of residues HI 
and 125, is highlighted In orange. 
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of two stacked antiparallel p-sheets 
(Figure 4). In addition to the p-sheete, ECL exhibits 
one a-helical turn between strands pll and 012, 
according to analysis using the program D5SP 26 
Similar to ECorL, WBA-I and WBA-H 77 ECL 
forms a nan-canonical dimer, in which two mono- 
mers are arranged back-to-back, as if engaged in a 
"handshake". Trie lectin dime? interface is charac- 
terized by extensive/ mostly hydrophobic inter- 
actions between the two monomers. Since ECL 
crystallized with only one molecule per asym- 
metric unit, the two protomers are crystallographi- 
cally equivalent 

Both ECL and ECotL have two N^ycosylatian 
sites, at Asnl7 and Asnll3t* Mass spectrometry 



t It was noted that glycosylatlon at Asnll3 could affect 
the carbohydrate binding properties of the lectins under 
investigation, as the bound hepta saccharide is located on 
the same side of the lectin as the combining site. 
Glycoevlation of Asnll3 could therefore critically 
influence lectin binding to dycocanjugatea and would 
then be insufficiently modeled by experiments involving 
only small saccharide analogs like UctOSe arid 
nicosy (lactose. However previous experiments with 
recombmant ECoxU which is not glycosylated, show that 
no difference* in binding properties are observed 
compared to the native lectin, 6 thus confirming the 
validity of our system. 



analysis of ECL revealed that both residues are 
linked to a heptasaccharide of the structure 
(deoxyhexose)i(pentose)i(hexose)3(N-acetylhexos- 
amine)* with partial c<cupancy at A*nll3. At both 
sites, the electron density for the attached sacchar- 
ide is rather weak, revealing only the rough pos- 
itions of one to three sugar residues. For this 
reason, none of the sugar rings has been modeled. 
This is in contrast to the Situation in ECorL, where 
the heptasaccharide linked to Asnl7 is exception- 
ally well defined by electron density. 7 A compari- 
son of the crystal environments of ECL and ECorL 
revealed packing interactions as the origin of these 
differences. While the N-llnked saccharide in 
ECorL is involved in extensive interactions with 
the Asnl7-urtked heptasaccharide of a symmetry- 
related molecule the same residue in the ECL 
crystal structure is highly solvent-exposed. 

It has been the focus of some debate, whether 
glycosylatlon of Asnl7 prevents the formation of 
the canonical dimer. ,W7 On the basis of the ECL 
structure, this possibility cannot be ruled out, 
although the nature of the protein surface involved 
in dimerisation suggests that the quaternary struc- 
ture is determined to a significant extent by factors 
intrinsic to the protein itself, independent of 
glycosylatlon, 

The carbohydrate-binding site. The ECL combining 
site is located in 0 shallow, highly solvent-exposed 
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figure 5. ECL combining site, 
(a) and (b) Stereo-pictures showing 
a superposition of the combining 
aites of (a) ECL (bhie) and ECoxL 
(white/red) m complex with 
lactose, (b) ECL In complex with 
lactate (blue) and rucojjllactase 
(white/red), respectively. Colors of 
the loops correspond to those in 
Figure* 3 and 4. Hydrogen bonds 
between the saccharide and ECL 
arc Indicated, (c) and (d) 2EKplob 
of the combining fiitefl of ECL In 
complex with lactose (c) and 
fucroyllactDse [&), respectively. 
Water molecules of ECorL which 
are not present In the ECL combin- 
ing Bite/ are indicated as yellow 
cutles. 



depression on the surface of each protorner. Several 
highly conserved amino acid residues form the site 
and provide the contacts necessary for a number of 
strong interactions with die carbohydrate tigartds 
(see Figure 5)* In particular, Asp89, GlylO/ and 
Aanl33 of the conserved Asp*Gry-Asn triad create 
the basis for a strong hydrogen-bonding network. 
Additional important interaction partners of the 
carbohydrate ligands are the aromatic ring of 
Phel31, which provides hydrophobic stacking 
interactions with galactose C3/4/5, and the main 
chain carbonyl oxygen of Leu86, which interacts 
with galactose 06 ma a structural water molecule. 
At the periphery of the combining site, AU21B 



and Gln219 from the specificity loop D* 5 engage in 
somewhat weaker hyd rogen-bonding interactions 
with the disaccharide. 

The carbohydrate binding activity of legume lec- 
tins depends on the presence of two metal ions, a 
cakium ion and a transition metal ion (usually 
manganese), which although not involved in 
directly binding the carbohydrate Uganda/ help to 
position die amino acid residues in contact with 
them. The metal ion coordination in ECL Is identi- 
cal with ECort* involving four protein ligands 
and lour water molecules. The four metal-coordi- 
nated water molecules are conserved in all legume 
lectins inspected, 11 one of them being essential for 
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Table 2. B-factora for water molecules and saccharide unite in the combining sibefi of the BCorL-Lac, ECL- Lac and 
ECL-Fu£Lac complexes (structurally equivalent water molecules are placed on the same line) 
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stabilizing the unusual Ala88-Asp89 ris-peptide 
bond typical for legume lectins! 

ECLand ECorL 

Overall comparison. Despite the fact that ECL and 
ECorL crystallized in different space groups, their 
structures are almost superimposable, with rms 
differences between C* atoms of the two lactose 
complexes being 0 37 A (0.28 A if only residues 
65-220 are considered). In the combining sites of 
ECL and ECorL, all direct as well as indirect 
Hgands of lactose via water molecules are con- 
served at identical positions/ within the error limits 
of the coordinates (Figure 5(a) and (c))< This is also 
true for the metal binding sites. The only 
noticeable conferences involve the orientation of 
TyrlOfi in (he second shell around the mono- 
saccharide-binding site, the orientation of the 
glucose ring of the lactose molecule as well as the 
positions of a few water molecules (ECL 126, 457, 
524> 563, Figure 5(c)). Overall, the water 
network is somewhat strengthened in ECL as 
Judged from the larger number of interaction 
partners, shorter H-boxiding distances and lower 
relative B-f actors (Table 2). The same observation 
is valid for the glucose unit of lactose, which is 
less disordered in the ECL complex, whereas the 
galactose unit is somewhat more ordered bi 
ECorL (Table 2). 

Analysis of amino acid differences- ECL and ECorL 
exhibit only few differences in their primary 
sequences. These are A54T, U11V, G125A, S175R 
A181V and H206D (written as ECorLWECL) 



(Figures 2 and 4). All of these substitutions have a 
very limited effect on the threeniimensKmal struc- 
ture. The only substitutions that are within reach 
of the cartahydrate^indmg site and therefore 
have the potential to cause the altered binding 
affinities compared to ECorL, are IlllV and 
G125A. These two residues, although distant in 
sequence, are very close to each other in space, 
with their slde<hains facing each other (Figure 4). 
They are part of a van der Waals interaction net- 
work involving also Val92, Phell2, Fhe94, Vall23, 
and, more peripherally, Leul09 and Ilel50, in an 
internal hydrophobic pocket of the protein. Val92 
forms a bridge between the two residues 111 and 
125 # by engaging in van der Waals interactions 
with the side-chains of both residues. The two 
'Volume-octtserving''** substitutions at positions 
111 and 125 compensate for each other, with Val92 
adapting to the change by a slight rotation of its 
side-chairL Even though the overall changes 
induced by these substitutions are small as well, 
they are nevertheless the strangest candidates for 
causing long-range effects on carbohydrate bind- 
ing, due to their location and their unique 
sequence compared to other Erythrina lectins (M. 
Sharon et sd. f unpublished results). In such a 
scenario, Val92 nuftht mediate these effects, by 
acting as a handle mat is pulled in order to trans- 
mit the changes to the combining Site. Alter- 
natively, a substitution of residue 111 might affect 
the combining site more directly, through residues 
106-108, Single or double mutants of residues 111 
and 125 of ECL or ECorL may shed tight on this 
possibility. 
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Binding of lactose versus fucosyttactosd 

A superposition of the lactose and fucosyllactDse 
complexes of ECL reveals no major rearrange- 
ments in me combining site (Figure 5(b)). In fact, 
both complexes superimpose extremely Well, With 
rmo differences of 0.12 A for the C* atoms of 
residues 65-220/ respectively (rmsd = 0.15 A for 
all atoms). Also the two lactose units superimpose 
quite well except for a small difference in the 
relative orientations of the glucose versus the galac- 
tose rings and a concomitant difference in relative 
^factors for the galactose unit (Table 2). 

The fucose unit which is linked to lactose via the 
C2-position of galactose/ is positioned in a shallow 
hydrophobic cavity adjacent to the znonosacchar- 
ide-binding site, shaped by Prol34, Trpl35, Tyrl08 
and TyrlOb! This cavity is connected to the primary 
site by the side-chains of Asnl33 on the one side 
and AIa218 on the other side. These two residues 
form a gate of &2 A in width/ through which the 
fucose unit stretches into its binding site. Similar, 
though not identical positioning of fucose was 
described in modeling studies of BCorL with 
fucosyl-N-ace ty Uactosamine.* 9 

Upon binding of the fucose moiety/ seven water 
molecules are displaced from the ECL confining 
sire, four of which are conserved between ECL 
and ECorL (compare Table 2). This loss is compen- 
sated for to a limited extent by the addition of 
three new water molecules, two of which engage 
in strong hydrogen bonds with the 2-OH and 
4-OH groups of the fucose unit (Figure 5(c) and 
(d)). Two additional hydrogen bonds of the fucose 
unit are to the protein (Figure 5(d)). The fucose 
2-OH is hydrogen bonded to Asnl33 N", which 
also interacts with the 3-OH group of galactose 
and with the grycosidic oxygen or the Fuca2Gal 
linkage, while the fucose 4-On is H-bonded to the 
hydroxy] group of lyxlOS. In order to engage in 
the latter interaction/ the side-chain of Tyrl08 is 
slightly pulled into the combining site, compared 
to its position in the ECL lactose complex, Interest- 
ingly, ELLA studies of a 4-methoxy derivative of 
rucosyllactose caused a small increase in affinity 
to ECorL 9 Our results on the ECL structure in 
complex with fucoeyllactose suggest that this 
increase in affinity probably results from van der 
Waals interactions of the 4rO methyl group with 
the fucose atoms C2/C3/C5 and possibly aJso 
with the iyrl08 side-chain. No hydrogen bonding 
interactions are observed for the fucose 3-OH 
group in the ECL complex, in agreement with 
results from ELLA studies for ECorL on a 3-deoxy- 
genated compound, 9 

In addition to the hydrogen bonds described 
above, the fucose moiety engages in several strong 
hydrophobic uiteractions with the lectin, notably 
its 2-OH with Pral34 C* (33 A). A strong hydro- 
phobic interaction of the fucose 2-hydroxy group 
has been postulated by Lemieux et of. (however, 
involving Itpl35) for BCorL and fucosyl-N-sCeyl- 
lactosaminfi/ on the basis of comparative ELLA 



77 



studies. 9 Interestingly/ a strong decrease rather 
than increase in binding for the fucose 2-decocy 
analog was found in a study on the related lectin 
WBA4L" which paints to differences in the bind- 
ing interactions of WBA-II compared to ECL and 
BCorL. Two further strong hydrophobic inter- 
actions of rucosyllactose involve the fucose O/ 
C4/C5, which abut to 33-3.8 A of the glucose 
6-Ofi and the fucose methyl group at the C6 
position. The latter group creates a distinctly non- 
polar surface at C4/C5/C6, which has been specu- 
lated to be involved in the interaction with one of 
the hydrophobic residues close to the mono- 
eaecharide-binding site. 10 Indeed/ the fucose 
methyl group is positioned dose to TytlQ6 CVC*. 

Additional interactions of the fucose are of a 
weaioer nature. They involve intra-molecular inter- 
actions with the glucose C6/O6 as well as inter- 
molecular ones with the hydrophobic residues 
lining the fucose-blnding cavity. Curiously, 
although three of the residues in this cavity are 
aromatic (Tyrl06, lyrlOB and Trpl35), none of 
them are involved in hydrophobic stacking inter- 
actions with the saccharide, which are so typical 
for protein^Caihohydrate complexes. 39 '* 4 Instead/ 
contacts are mainly established through atoms on 
the rim of the aromatic residues or of Prol34. Com- 
pared to the lactose complex, the side-chain of 
*iyrl06 is slightly reoriented, such that it actually 
moves slightly away from the fucose residue. The 
side-chain of Trpl35 is held in position through a 
stacking interaction of the 6-ring portion of its 
indole with Asnl33 C p , an interaction also present 
in the ECL lactose complex. 

Comparison of the rucoayllaetose complexes 
of ECLandU£A-U 

lb date/ only one other crystal structure of a 
legume lectin has been solved in complex with 
rucosyllactose/ namely that of lectin U from 
U. cvropaeua (TJHA-II).* 9 UEA-It belongs to the 
chitobiose specificity group/ but exhibits a pro- 
miscuous catbohydrate-bmding site* with rughest 
afBniry for rucosyllactose. 36 A comparison of the 
rucosyllactose complexes of ECL and UEA-H 
reveals qualitatively similar interactions of the 
fucose residue/ even though the saccharide units 
have completely different relative orientations. 
The fucose 2-OH and 4-OH groups are involved 
in hydrogen bonds with the protein (UEA-U resi- 
dues Glyl06 and Serl04), and van der Waals inter- 
actions occur to several aromatic residues as well 
as to glucose Co/06. Even the strong hydrophobic 
interaction of one of the fucose hydroxyis is con- 
served, however, engaging the fucose 3-OH and 
Trpl38 (corresponding to ECL residue Trpl35) 
instead of fucose 2-OH and Prol34, as in bCL. 
These interactions/ although qualitatively similar, 
however have a completely different structural 
basis, both with respect to the conformation of the 
trisaccharide and regarding me protein architec- 
ture of the combining site. The largest difference 
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concerns ECL residue Tyrt06 and UEA-H residue 
Tyt\35, which are located on opposite sides of the 
fucose rings. Neither of these residues have a struc- 
tural counterpart in the other lectin, with iyfl35 of 
UfiA-II even lying the solvent region of the ECL 
carbohydrate complex. 

The structure of the UEA-H combining site may 
in Met resemble more closely the combining site of 
WBA-H as judged from the results from a thermo- 
dynamic study of the latter lectin." These studies 
revealed a alight increase in the binding constant 
of 3-deoxy fucosyl-N-aoetyllactosamine, white the 
^methoxy analog bound significantly worse, 
which suggests that the 3-OH group is involved in 
a strong hydrophobic interaction with the lectin. 

Correlation of structural and 
thermodynamic data 

The results of the nucrocalorimetry experiments 
are summarized in Table 3, a typical experiment is 
shown in Figure 6. A comparison of the thermo- 
dynamic parameters underlying the binding of 
galactose, lactose and fucoeyttactose, respectively, 
revealed similar tendencies for ECL and ECorL: 
While 6Jf is more negative for the binding of lac- 
tose compared to galactose and fucosyllactose, AS 
is significan tly moxe favorable for complexation 
with fucosyllactose 01 galactose than for lactose 
binding. Whereas the thermodynamic data for the 
binding of fucosyllactose are very similar, for ECL 
and ECorL, respectively, titrations with galactose 
and lactose revealed significant differences, 
especially in the entropic terms. The difference in 
binding constants of ECL and ECorL for lactose 
and fucosyllactose is much smaller than expected 
from the experiments on solid-phase immobilized 
glycoUpidst- While binding to fucosyllactose is 
very similar for ECL and EcorL, binding to lactose 
differs only by at most a factor of two (the factor of 
two difference in lactose binding affinity is in 

t The differences in the results from the microliter well 
essays and irucrocalarimetry may be due to the 
difference In sensitivity of the two methods. As 
microliter well assays are rather insensitive and might 
only binding Above a certain threshold, the 
apparent differ***** between the two methods can easily 
be explained with a threshold corresponding to a 
binding constant K> between 2000 M" 1 and 3000 M* 1 And 
therefore do not Influence oux interpretations of the 
rrucrocalcrrimtftiic data. 



bi 



agreement with previous results from micro- 
calorimetry experiments and hemaggtotinauon 
inhibition assays"-"). 

Knowledge of both the structures of several 
closely related lectm-carbohydrate complexes and 
the corresponding thermodynamic data should 
make it possible to correlate some of the differ- 
ences in thermodynamic parameters with struc- 
tural differences. While changes in enthalpy (AH) 
are generally interpreted in terms of differences in 
hydrogen bonding, electrostatics and van der 
Waals interactions, entropy changes (AS) are 
essentially attributed to differences in hydrophobic 
interactions and mobility. 

The crystal structures of ECL and ECorL suggest 
that the general tendencies for the roenrwxiynamic 
parameters for galactose, lactose and fucosyllactose 
finding correlate well with the differences in the 
local water network in the lectin combining sines. 
Upon lactose as compared to galactose binding, 
the local water network is strengthened, leading to 
a more favorable enthalpy due to an increased 
number of hydrogen bonds to the carbohydrate 
ligand as well as to the protein (compare*"). This 
gain in enthalpy Is coupled with a lass favorable 
entropy term. Binding of the fucose unit goes 
hand in hand with the release of some of these 
strongly bound water molecules, which roughly 
returns the situation to the galactose case. Thus, 
upon binding of fucosyllactose compared to lac- 
tose, the enthalpy does not increase with the area 
buried, as was suggested by Elgavish & Shaanan,* 
but in fact decreases. While for ECorL, at least the 
free energy (- AG) increases slightly upon binding 
of fucosyllactose compared to lactose, neither of 
these values increases (to a significant extent) for 
ECL. 

With respect to the entropic term the binding of 
lactose is highly unfavorable, whereas the binding 
of galactose or fucosyllactose Is more favoiable- 
From a structural perspective, the entropic costs of 
lactose versus galactose binding can probably be 
related to the ordering of the combining site, not 
the least due to the mentioned recruitment of struc- 
tural water molecules to this site. TlruS, more water 
molecules enhance the negative binding enthalpy/ 
but also reduce the disorder in thelec tin-carbo- 
hydrate -sofvent system and hence the entropy 
change is negative. Upon binding of fucosyllactose, 
this order is not sigruficaiuiy reduced, but the 
entropic term is favorably influenced by the hydro- 
phobic interactions of the fucose moiety. Most 
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Funirt 6, The results of a typical rrC experiment which consisted of titratog 10 id ahguote <>£^ ^bctose intoa 
fiolu^miitaizii^ 0.0325 mfi ECL In ZumM phosphate and 150 mM sodium chloride buffer al 296 K- the ma* 
shows thabmdmg isotherm fit of the data to a M binding modeL 



Importantly, the fucose 2-OH engages In a strong 
hydrophobic interaction with Frol34 C* and the 
fucose methyl group interacts with Tyrl06 C". 

While the thermodynamic parameters for ECL 
and ECorL follow similar general trends, a more 
thorough comparison reveals distinct differences 
with respect to the binding of galactose and 
lactose, as mentioned above. In both eases, the 
differences in binding constants are to a large 
extent due to different entropic contributions for 
ECL and ECorL, respectively. It is, however, not 
straightforward to explain these differences in 
structural terms, as the differences between the 
ccanbining sites of ECL and ECorL are minimal 
Possibly, the slighdy higher relative Mac tors for 
the galactose residue oflactoe in ECL indicate a 
somewhat enhanced mobility of the nvonosacehar- 
ide-binding site for this lectin (leading to a more 
favorable entropic term), which might be caused 
by the substitutions at residues 111 and 125 and 
transmitted to the combining site tfrfl Val92. It 
would be interesting to probe this system with 
molecular dynamics simulations similar to the 
ones performed by Bradbrook tt al. 16 



Conclusions and Outlook 

In the present investigation, the crystal struc- 
tures of two carbohydrate complexes of ECL have 
been solved at high resolution, revealing derailed 
structural information of the interactions of this 
lectin with lactose and fucosyllactose, respectively 
The structures were compared with the crystal 



structure and with two molecular models of the 
highly homologous lectin from ECorL with 
fo^yj.^cetyllactc^arftine, which was found to 
exhibit subtle, but distinct differences in binding 
properties compared to Ed- An effort was under* 
taken to characterize the protein-carbohydrate 
interactions of this system in light of the thermo- 
dynamic data obtained by rrucrocalorimetry. The 
results of this analysis point to an important role 
of the water network in the lectin combining site. 
We propose that two amino acid substitutions at 
position 111 and 125 are the cause of the differ- 
ences in binding affinities between ECL and 
ECorL Even though the two substitutions compen- 
sate for each other; they cause a slight rotation of 
the 6lde*chain of the interacting residue Val92, 
which might transmit the changes to the combin- 
ing site. The hypothesis proposed can now be 
tested by creating hybrids of the two lectins that 
differ in positions 111 and 125. 



Materials and Methods 

Isolation of Erythrlna lectin* 

The lectins from E- rubrkierva, B- vespertiHo, E tysistemon, 
E. affnj, and t.ftabellifornils were purified by affinity 
chromatography on laetose-Sepharose, as described," 
ECL was purchased from vector Laboratories Inc. 
(Burlingame, CA) and Sigma (Sl Louis, MO), while 
ECorL was obtained from Sigma. The lectins were 
dituted to 1 mg/ml in phosphate-buffered saline (PBS) 
' >H 7-3). containing 6 mM phosphate buffer (pH 73), 
14 M Nad and 4mM KCL. AUquots of 100 u£ were 
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labeled with m L using Na'*I (100 *Ci/ml; Arnetsham 
FhannadA Biotech Little Oudfont, UK), according to 
the IODOGEN protocol of the manufacturer (Ken*, 
Rockford, a). Approximately 5xl0 , cpm/|ig protein 
was obtained. 



Reference* glycosph!r»gol!p]da 

The rfycoBphingolipids utilized in the carbohydrate 
binding studies were isolated by standard methods" 
and characterized by mass spectrometry and proton 
NMR, as Hfvribfld. lf 



MlcratltorwBllaway 

The microliter well binding assay was performed as 
described. 19 m brlei renal dilutions (each dilution m 
triplicate) of pure gtycuphingoUpida In methanol were 
applied in microliter well* (Cooks M24, Nutacon, 
Holland). When the solvent had evaporated me wells 
were blocked for two hours with 200 ul of PBS contain- 
ing 2% (w/v) bovine serum albumin and 0.1% (w/v) 
NaNi (SoL 1). Thereafter 50 ul of radiolabeled Erythrina 
lectma diluted in Sol 1 (approximately 2x UPcpm/iU), 
were added per well and incubated fox four hours at 
room temperature. After washing six limes with PBS, 
the wells were cut out and the radioactivity counted in 
a gamma counter. 

Isothermal titration caiorfmetry (ITC) inawrremftnta 

The thermodynamic parameter? for the binding of 
galactose^ lactose, and focosyllactose (Dexfaa Labora- 
tories LTD) to ECL (Sigmat) at 298.15 K were detenxuned 
by ITC using a MfcrocaL Inc. VP ITC The VP ITC con- 
sists of a matched pair of sample and reference vessels 
(1.409 ml) enclosed in an adiabatk enclosure and a rotat- 
ing stirrer^yrlnge for titrating aUquota of the ligand 
solution into the sample vessel The sample vessel con- 
tamed 0J)2-O052mM of the lectin solution while the 
reference vessel contained just the 20 mM Na*PO, and 
0.15 mM Nad (pH 7.4) buffer solution. Aliquots (10 pi) 
of the carbohydrate solution (at concentrations of 1.0- 
12.7 mM) were titrated four minutes apart into the lectin 
sample solution until me heat exchanged under the 
titration peak was reduced by a factor of 5-10. Any con- 
tribution of the carbohydrate heat of dilution to the 
binding enthalpy was determined by titrating the carbo- 
hydrate Ligand solution directly into me buffer solution 
in the sample vessel The heal* of dilution were then sub- 



t While the carbohydrate binding assays, mass 
spectrometry and crystallization experiments were ail 
performed using ECL purchased from Vector 
Laboratories Inc., die microcaloriinetry experiments 
were done with ECL from Stgma. The reasons for this 
difference were purely practical as the lectin preparation 
from Vector contained Large amounts of lactose and was 
unstable In me absence of carbohydrate, In order to 
ensure that all experiments in the present study were 
compatible, we therefore compared the binding partem 
of dissolved ECL crystals to the commercially obtained 
samples of both sources. The obtained curves were very 
similar to Figure 1(a) (data not shown), thus reassuring 
ua that functional conclusions drawn from me crystal 
structures of the ECL complexes have a valid basis. 



tracted from the heats obtained during the titration prior 
to analysis of the data. 

A non-linear least square miruxnization software pro- 
gram from Mlcroeai, Inc., Origin 5.0, was used to fit the 
incremental heat of the jth titration <AQ(0) of <he totol 
heat, Qt, to the total titrant concentration, X, with a 
stokhloineiry fixed at n » 1.0, according to the following 
equations 

Q, - C,AHgVa + Xt/Q + 1/KbC. - fa + X»/C, + 1/JCbQ 1 

AQ(0 - 0(0 +dVf/2V(Q(i) + Qtf - D) - Q(i - 1) DM 

where Q is the total lectin concentration in the sample 
vessel and Vis the volume of the sample vessel to yield 
values of JC* and AHjJ. The uncertainties reported tor & 
and AHjj are the standard deviations from the average 
value, determined from at least two different titration. 

Peptide mapping by mace spectrometry 

Pot enzymatic digestion, approximately lOnmol of 
ECL (Sigma, St Louis, USA) was dissolved in 500 ul 
buffer containing 10 mM n-octylglucoside, 0.1 mM 
Cad and 0.1 M NUKCO* to which 100 pmol modified 
trypsin (Promega, Wisconsin, USA) had been added. 
Digestion of the protein proceeded at 38 *C for ten 
hours. The mixture was dried in a Speedvac (Savant, 
Inc.). ■ n 

Samples were analyzed using a Micramass ToBpecE 
MALDI-TOF mass spectrometer (Mlcromaaa, 
Manchester, UK) equipped with a pulsed 337 run 
nitrogen laser, s delayed extraction Ion source and a 
reflection. All spectra were acquired in reflection mode 
at an accelerating voltage of 20 kV and were the sum of 
100 laser shots. Eternal calibration was performed with 
the monoisofeopic masses of angiotensin II and ACTH 
(18*39). Spectra were analyzed using the MaasLynx 
(Micromass) software in a WindowsNT environment 

Fragment ion data were acquired In an electrospray- 
quadrupole tirne-of-flight instrument (QTof, Micromaefc 
Manchester, UK). The protein digest was dissolved in 
3jtl of acetonlWle/water (1:1, v/v) containing 0.1% 
formic acid and sprayed from gold-coated glass 
capillaries in a nanoflow source. Argon was used as the 
collision gas. Instrument calibration was performed 
using fragment tons from Glu-flbrlnopeptide B and a 
fourth-order polynomial fit MS/MS spectra were post- 
processed with the MaxEng software (Micramass). The 
sequencing results are given in Figure 2- 

Crystallization 

ECL containing 13% lactose by weight (Vector labora- 
tories, Inc. Burlingajne, CA) was dissolved In 20 mM 
Hepes, 100 mM NaCL to a concentration of 'lOmg/mL 
Crystallization experiments were performed using the 
hanging drop vapor diffusion method at room tempera- 
ture. The first screening (Crystal Screen" Hampton 
Research) resulted in small tetragonal pyramidal or 
blpyramidal crystals from 2 M ammonium sulfate, 0-1 M 
Tris-HO (pH 8.0). The crystal size was improved by 
macroseedlng into a prc*equilibrated protein aolbtjon 
conraming 1JIM ammonium sulfate, 0.1 M Tris-HCl 
(pH W) and 15% glycerol. Crystals grew to a maximum 
size of 0.2 mm x 0.4 mm x 0.4 mm. 
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For ccxiystallizatkBl .with fuccaylUetose (Dextra 
lAbratorJeauTJ), me lectin had to be diluted and con- 
cenbated In the buffer ccoiuuning the Ugand. Crystals 
were obtained with the same procedure as described 
above* 



Data collodion and processing 

FleUxninaiy X-ray data collection to 16 A nsohitkm 
wad carried out at AatraZeneca BAD Molndal, on a 
rotating copper anode (Rigaku RU30Q HB) equipped 
with a Mar 345 image plate (at T ^ 100 K), The crystals 
belonged to space gn*m F«»Z with cell dimensions of 
a = b» 81.8 A and c^- 126.1 A, containing one molecule 
per asymmetric unit After die structure of the lactose 
complex had been solved, a high-resolution data set 
findudma reflections to 145 A resolution) was obtained 
at the a^ichrotron MAX-lab H m Lund, at beamline 
1711, also equipped with a Mar 34S image plate. Data 
were collected at cryogenic temperature (100 K). The 
wavelength for data collection was set to 1.03 A and the 
crystal to detector distance to 200 mm. Data collection 
proceeded in two steps- Finn, a higrwesohition data set 
was collected covering 9£r\ with oscillation ranges of 
0^*. A quick second data collection sweep covered 
another 90" of reciprocal space (oscillation range 1-25*), 
ensuring high completeness also of the low-resolution 
data. 

The for the ECL complex with rucoeyllacmse 
were collected in a similar way, nowereii with oscillation 
ranges of 1j0* and 125*, respectively. The crystal 
diffracted to 1.7 A resolution. 

All data were processed and scaled with XD5.*~* 
Data collection atatistics are summarized In Table 1. 



Structura determination and refinement 

The structure of ECL in complex wilh lactase was 
solved by molecular replacement using the program 
AMoRfi/ Structure drteradnation was straightforward, 
wtihout the need to adjust many parameters. As a search 
model, we took the crystal Structure of ECorL' (PDB 
entry 1AX0), from which the calcium and xrtfmganes* 
ion*, the carbohydrate Ugand and all water molecules 
had been removed. Refinement was carried out with 
CN5." After the first refinement cycle, the calcium and 
nwnzaitese ions were added to the model. Slmukted- 
anr*aling omit maps were calculated in order to remove 
any model bias from the electron density maps used for 
model building. The electron density maps dearly 
showed the presence of the lactose Ugand even at the 
earliest stages of refinement Refinement (m the first 
roimda including simulated annealing) altered with 
cydes of manual rebuilding using the computer graphics 
package O.* Parameter and topology files for the carbo- 
hydrates were taken from the H1CUP server, 44 and some 
of die weights were adjusted as necessary. After several 
round* of refinement water molecules were added to 
the structure, if the corresponding (F<*j - density 
was at least 3a and geometric requirements for hydrogen 
bonding were fulfilled. In the final stages of the refine- 
ment, alternative conformations were defined for 
residue* 9, 10, 12, 95, 159/ 173, 180, 234 as well as for 
glucose W. The electron density for glucose Ol' was 
found to be consistent wim a mixture of the a and 
P-anomerH. As the R-ractors increased noticeably for 
data beyond 1.56 A resolution and the electron density 
maps at higher resolution did not reveal much additional 
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Inrcrrnation, the data were cut at 158 A resolution m the 
final refinement cycle. 

The structure of the ECL famvllactose complex was 
re&ted using the epo-atrueture of the ECL lactose com- 
plex as a basis for rigid body refinement, followed by 
simulated annealing. Refinement men proceeded as 
described above for the ECL lactose complex, however, 
without refining alternative coruwnations for protein 
side-chains. The hjccayUactose Ugand was bull] into the 
density in two steps, first the lactose unit and later the 
fucoserlng. 



Structura comparisons 

Comparisons of different lectin complexes were 
usually based on a superposition of the respective C 1 
coordinates of amino add residues 65-230 using the 
program O. 48 For ECorL, the comparative analysis was 
based on PDB entry 1AX1 (complex with lactose at 
1.95 A resolution), 9 where not stated otherwise. Alter- 
natively to this "standard" superposition of residues 
65-220, also alignments involving only three conserved 
residues, e-g. Asp89, Qyl07 and Asnl33 or Asp89. 
Asnl33 and Ala218 were tested, with essentially the 
same results. The comparison of the fucoayllaetose com- 
plexes of ECL and UEA4P* (IQOX monomer IN) was 
based on a superposition of the galactose residues. 



Figures 

figures 3-5(a) and (b) were generated with Bobacrtpt 
(Figure 3)** and McOscript* (Figures 4 and 5), 
respectively. Figure 5(c) and (d) were generated with 
ESIS^/Draw 2.4- 



Atomic coordlnaiM 

The refinement statistics axe summarized in Table I. 
The coordinates and structure factors are deposited 
wim the Protein Data Bank (accession codes 1GZC and 
1GZ9). 



Disclaimer 

Certain cornmcrclal materials, irtttrumenls, and equip- 
ment are Identified in this manuscript in order to specify 
the experimental procedure as completely as possible. In 
no case does such identification Imply s recommcn* 
dadon or endorsement by the National Institute of 
Standards and Technology nor does It imply that the 
materials. Instruments, or equipment identified is 
necessarily the beat available for the purpose. 
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The crystal structure of a Man/Glc-specific lectin from the seeds of the 
bloodwood tree {Pterocarpus angolensis), a leguminous plant from central 
Africa/ has been determined in complex with mannose and five manno- 
oligosaccharides. The lectin contains a classical mannose-specificity loop, 
but its metal-binding loop resembles that of lectins of unrelated specificity 
from Ulex europaeus and Maackia amurensis. As a consequence, the inter- 
actions with mannose in the primary binding site are conserved, but 
details of carbohydrate-binding outside the primary binding site differ 
from those seen in the equivalent carbohydrate complexes of concanavalin 
A, These observations explain the differences in their respective fine speci- 
ficity profiles for oligomannoses. While Man(al-3)Man and Man(aW) 
[Man(al-6)]Man bind to PAL in low-energy conformations identical with 
that of ConA, Man(al-6)Man is required to adopt a different confor- 
mation. Man(al-2)Man can bind only in a single binding mode, in sharp 
contrast to ConA, which creates a higher affinity for this disaccharide by 
allowing two binding modes. 

© 2003 Elsevier Ltd. All rights reserved. 
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Introduction 

Protein-carbohydrate recognition is a major 
form of inter-cellular communication and plays a 
role in many biologically important processes 
such as viral, bacterial, mycoplasmal and parasitic 
infections, targeting of cells and soluble com- 
ponents, fertilisation, cancer metastasis and growth 
and differentiation. 1 

The specific recognition of an (oligo)saccharide 
by a protein is a much more complex problem 



Abbreviations used; ConA, concanavalin A from 
Camvalia ensiformis; PAL, Pterocarpus angotensis Man/ 
Glc-specific seed lectin; MAL, Maadda amurensis 
leucko agglutinin; GS-IV, Griffbnia simplidfolia lectin IV; 
UEA-U, Ulex europaeus lectin It FRH, FIG receptor 
interacting lectin from DoUehoa labtab; LOL, Lathyrus 
ochrus lectin. 

E-mail address of the corresponding author. 
reloris@vub.ac.be 



than other biologically relevant recognition pro- 
cesses such as protein— protein or protein— DN A 
interactions. The monosaccharide building blocks of 
a glycan are difficult to distinguish from each other 
due to the limited repertoire of functional groups 
involved- Apart from the occasional N-acetyl group 
or rarely a carboxylate group, one finds invariantly 
a large abundance of hydroxyl groups interspersed 
with small aliphatic patches. In addition, the glyco- 
sidic bonds between two monosaccharides are rather 
flexible, especially the 1-6 linkage. As a consequence, 
a high en tropic cost limits the binding affinities that 
can be obtained. The combination of flexibility and 
the difficulty in distinguishing monomelic building 
blocks allows oligosaccharides to mimic each other 
structurally, making the task of specific recognition 
a truly difficult one. 

The lectins from leguminous plants have been 
considered as a model system for studying the 
molecular basis of protein-carbohydrate inter- 
actions for several decades. Among all known 
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P. anaoJensls — qdslsfgfptf-psdqk — nlifqgdaqikn-navqltktdsncsn«^Mrii«f$aovhlwekss 
A hyoocheao — ldslsfsyknf — eqddewtlilogdakfsaskgiqltkvddngt^^^^Hrvlhstqvr^wekst 

LochWS TBTTSFSITKF-GFDQQ NLIFQGDGYTTK-WLTLTK ^^MrALYSSPIHIWDSKT 

C. enslfotmiS TKETNALHFMFNQF-5KO0K DLILCKSDATTGTEGNLRLTRVSSWGS^^MRALFYAPVHTWESS- 

0 laflab AQSX.SFSFTKF-DPNQE DLlPOGHATSTN-NVLQVTKLBSAGN^^^HRVLYSAPLRX.«eDS- 

& SimpliofoHa QNTVKFTYPDFWSYSLI^GTEITFLGDATRIP-CAl^IiTKTDANGM^^^^HQASYSEPVFtTOST- 

M. amuransis — sdelsftinnf-vpmsa — dli^qgeasvsstgvi^ltkveng-q^^^Bralyaapvriwgntt 
U europaeus — sddlsfntdkf-vpnqk — h i i fqgdasvsttgvlqvtkvsk — B^^Bral yaap iqiwds it 

A B + 

P. angolensis srvaotqsqfsfslks?i«-sJ^HHaffiappdttipsg^^HIiGlfapgtaqnts — — anq 

A hVPOCheae NRLTNF0AQFSFVIK5PI-D|MB|AfFIAAPDSEIPKNS^OL0LFD?QTAQNPS — AWQ 

L ochrus gnvanfvtsftfvidapnsyM|Htffiapvdtkpq--t^Blgvfn--skdydk--- -tsq 

C ensiformis AWASFEATFTFLlKSPD-sHHiAFFXSNIDSSIPSGS^BLGIJPDAlfViraSTTIDFNAAYNAD^ 

D. lallat AVLTSFDTIINFEISTPYTSfflHBAFFIAPPDSVI— SY^MLGLFPNAN SN 

a simplicifQiia gkaasfytsftflknt-g-a^^HBafflapvdssvk--d^^Blgi»frbe , taadps KNQ 

Af. amurensis gsvasfstsftfvvkapi^dH^HUfylappdsqipsgsB^Blglfnnsnsdss nq 

U europaeus gkvasfatsfsfwkadk-sHHMa^i'Lapansqipsgs^^fglfsssdskss — nq 

++ c + 

P anOOlensIS VIAVEFD^^^^^K^fflS^IGIDVNSIRSVKTVKWD — hrdgqslnvlvtfnpstrnldwat 

A hvoocheae vlavefd^^^^^^^M^igidvusiksaattkwe — rrdgotlnvlvtydaksknlqvtas 

/ <J£«, 5 TVAVEFD^^^^SS^^^MIGIOVNSIKSINTKSWK IiQNGKSAN W I AFNAATNVLTVS X«T 

6 ensiformlS IVAVEUP^^^^ffl^^^^ffiXGIDIKSVRSKKTAKtfN MQNGKVGTAHIIYMSVDKKI.$AWS 

D laliab WAVEFDl^M^^^^^&GIDVNSIRSKVTAKWfD WQNGKIATAHISYNSVSKRLSVTSY 

G. simpfidfolia ivAVEFn§|||^^ 

M amUrenslS IVAVEFDfi^S^ffiS^^^^XC^^VNGIESIKTVQWO WINGGVAFATITYLAPNKTLIASLV 

U. europaeus iiavbfd^^^^^^^^kIigidvnsiksiktvkwd — wrngevadvvityi^aptksi.tvcls 

* + * p 

P angoiefisia YS-DGTRYEVSYEVOV^^s^^EWVRVGFSA^■BHH? LESWSFTST ^^ AQKK^ ^ I ^^ 

A ftySocsheae ^p-dgqryqlsyrvdlrdylpewgrvgfsa^^^^^Helqswsftstllytsphylklgjifmi 

L_ OcfwVS YPNETSYTL-NEWFl.KEFVPEWVRIGFSA^^^^^MEVLSWFFH3ELAGTSSaN . 

(X ens'fforrniS YPNAD S ATV- S YDVDLDNVLPEWVRVGLSA^^^^^^fclLSWSFT SKX*KSNE IP D I AT W 

0 laliab AGSKP ATL- S YDI ELHTVLPEWVRVGLSA^^^^^^^^BTVHSWSF TS S LWTNVAKKEJJENKV I TRGVL 

a $impUctfolia ye-hgrdyilshvvdlakvijqkvrigfsa^^^^^^kilswhffstldgtnk - — 

ML amurenSiS YPSKQTTFSVAASVDLKEILPEWVRVGFSA^^^^^^toVLSWSFTSTL 

a europaeus vp sdgtsni itasvdlkailpewvsvgfsg^^BHHp vlswyftsnl- 

Figure 1. Amino acid sequence of the P. angolensis lectin. The amino acid sequence of PAL as used in the crystal 
structure determination is aligned with those of other legume lectins discussed in the text Residues where the crystal 
structure conflicts with the cDNA-derived sequences are indicated by an asterisk ( * )- Positions where sequence 
variation was observed among the different cDNA clones sequenced are indicated by a +. The five stretches A-E 
mat constitute, the carbohydrate-binding site are indicated and shaded in different colours. 



lectin families, animal plant or microbial, they 
cover the widest range of carbohydrate speci- 
ficities- Their carbohydrate recognition site consists 
of several loops, with differing degrees of varia- 
bility. 2 ^ The conformations of these loops are deter- 
mined by the presence of a structural calcium 
ion and a transition metal ion/- 6 the absence of 
which results in local unfolding and loss of carbo- 
hydrate-binding capacity. The vast amount of 
structural and biochemical data already available 
allows us to rationalise the structural determinants 
of carbohydrate specificity. Indeed, the crystal 
structures of more than 25 legume lectins have 
been determined*- 7 * 30 and are present in the Lectin 
Databaset Here, we present the crystal structure 
of the seed lectin (PAL) from the bloodwood tree 
(Ptervcarpus angolensis; a ke^minous plant from 
central Africa) in complex with a series of oligo- 
mannose ligands. This lectin belongs to the 



t http:/ / wwwxemav.crtrs.fr/lectines/ 



Man/Glc specificity group and has a preference 
for Man(al-2)Man and Man(al^)[Man(al-6)]Man 
(L, Buts & S.B., unpublished results). Its fine 
specificity differs from that of other known Man/ 
Glc-specific lectins, while its amino acid sequence 
suggests some peculiarities in the carbohydrate- 
binding sites not seen in other legume lectins with 
related specificities. 



Results and Discussion 

Amino acid sequence and overall structure 
of PAL 

The amino acid sequence was deduced from 14 
cDNA clones (data not shown). The lectin gene 
starts with an ATG start codon followed by a signal 
sequence corresponding to 34 amino acid residues* 
This signal peptide ends with a putative signal 
peptide cleavage site between residues Ser and 
Gin* 1 This glutamine is the N-teTminal residue 



PAGE 25/63 ^ RCVD AT 5/11/2006 5:45:19 PM [Eastern Daylight Time] " SVR:USPTO-EFXRF^/22 s DNIS:2730960 * CSID:202 371 2540 " DURATION (mm^ss):36-32 



MflY-1 1-2006 18 : 02 SKGF 
Pterocarpus Lectin Structure 



202 371 2540 P. 

1229 



observed in all X-ray structures and is present in 
the form of a cyclic glutamine residue. This is in 
agreement with the fact that the N terminus of the 
affinity-purified protein was proven to be blocked 
(data not shown). 

At several places, heterogeneity was observed in 
the cDNA sequences: Ile/Thr26, Leu/Prol08, 
Fhe/Leul29, Asp/Hisl30, Asn/Aspl43 and Gly/ 
Arg2l2. In the electron density maps of our crystal 
structures, these residues were identified as Re26, 
Leul08, Phel29, Aspl30, Asnl43 and Arg212. 
Since the sequences were determined starting 
from mKNA, it can be assumed that all the corre- 
sponding protein variants are indeed synthesized. 
With the exception of Ile/Thr26, all these residues 
are located at or near the carbohydrate-binding 
site. Especially Phe/Leul29 and Asp/Hisl30 are 
known to be critical fox carbohydrate and metal- 
binding and are conserved in other legume lectins. 
It is possible, therefore, that some of the cDNA 
clones correspond to proteins that are closely 
related to PAL but are inactive or have altered 
carbohydrate specificity. 

Figure 1 aligns the sequence of mature PAL as 
used in the crystal structure determination with 
those of other legume lectins that are discussed 
here. At two positions, the electron density was in 
disagreement with the cDNA-derived sequences: 
position 206 was interpreted as Val rather than 
Ala, since clear side-chain density was present, 
Val was chosen over Thr, as this side-chain is point- 



ing towards a hydrophobic area without any 
potential hydrogen bond donors or acceptors. At 
position 220, Gly was modelled instead of Arg- 
There is no electron density for an Arg side-chain 
(including its C p atom) in this otherwise well- 
defined region of the electron density map, and 
the presence of even a small side-chain at this 
position would clash with any sugar bound in the 
carbohydrate recognition site. 

PAL has the highest level of sequence identity 
(64%) with the Man/Gl«pedfic lectin from 
Arachis hypogae (peanut), for which no crystal 
structure is available. Among those lectins for 
which the crystal structure is determined, the 
sialyHactose-specifk MAL from Maackia amurensis 
(the Amur Maackia tree from Asia) is most closely 
related (43% sequence identity), while other 
Man/Glc-specific lectins such as ConA, FRIL and 
LOL show only 40-42% identity. 

The overall structure of PAL has been described 
and is shown in Figure IP- It consists of the typical 
legume lectin p-sandwich, the details of which are 
well known. 23 

All structures presented show good electron 
density for residues 1-238- Tyr239 was fit into the 
density, but the temperature factors for this residue 
remain high. Weak density is seen for residues 
Thr240 and Ala241. The bound carbohydrate 
molecules display clear electron densities in each 
of the complexes (Figure 3). 

There are two legume lectin monomers in the 



Figure 2. Overall structure of the P. angolensie lectin. Stereo cartoon representation of the PAL dimer (Man(al-3)Man 
complex). One monomer is coloured orange, the other one yellow. Manganese tons are shown as lighk blue spheres and 
calcium ions as green spheres. Two bound molecules of Man(al-3)Man are shown in ball-and-stick representation. 
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Figure 3- Experimental electron densities of the bound carbohydrates, (a) a-Me-Mannose, (b) Man(al-2)Man, 
(c) Man(al-3)Man, (d) Man(al-4)Man, (e) Man<ctl-6)Man, (f) Man(al-6)[Man(al-3)]Man in monomer A and (g) 
Man(al-6)[Man(*W)]Man in monomer B. In each case, except (g), density in the binding site of the A monomer is 
shown. In all cases, the electron density map is an £ - F <a k map contoured at 3.0tr and obtained by taking the final 
refined model and deleting the carbohydrates in both binding sites. 



asymmetric unit, together forming the canonical 
dimer (Figure 2), as has been observed in many 
other dimeric and betrameric legume lectins. 2 *-* 5 
As a consequence of the crystal packing, the carbo- 
hydrate-binding sites of subunits A and B in the 
asymmetric unit of the crystals are not equivalent. 
The binding site of subunit A is involved in crystal 
packing. In all carbohydrate complexes, the ligand 
bound to the binding site of subunit A makes con- 
tacts with protein atoms from a symmetry mate. 
The binding site of subunit B on the other hand is 
not involved in crystal packing. In all but one of 
the carbohydrate complexes/ binding seems to be 
identical with both subunits. The only exception 
is the Man(al-3)[Man(al-6)]Man complex, where 
the conformation adopted in binding site B is pre- 
vented in site A due to a steric hindrance from a 
symmetry-related protein molecule. 

Carbohydrate-binding site 

The carbohydrate-binding site of PAL can be 
described as one with a primary binding site that 
recognises glucose and mannose. This primary 
binding site lies in the centre of a shallow groove 
on the surface of the protein (Figure 4(a)). Exten- 



sions to the sugar bound in the primary binding 
site (M) can be made in two directions, following 
Ol or 02. Independent of the exact linkage that is 
present, these additional sugar residues occupy 
two particular regions of the lectin, which we call 
the -1 (attached to 02) and +1 (attached to Ol) 
subsites. In the case of longer oligosaccharides, we 
speak of additional +2, +3; ... or -2, -3, ... sub- 
sttes depending on their position relative to the 
primary binding site. 

The carbohydrate-binding site of all legume 
lectins consists of residues belonging to five poly- 
peptide stretches (termed A to E according to 
Sharrna & Surolia 1 ) (Figure 4(b)), which vary to 
different degrees between lectins with different 
specificities. Stretches A and B contain an essential 
aspartate residue (invariantly preceded by a cis- 
peptide bond) and backbone NH group (usually 
from ajdycine residue; Glyl06 in PAL), respec- 
tively Tne conformations of these two stretches 
do not vary much among different lectins. In the 
current structures/ this picture is confirmed. 

Stretch C is the metal-binding loop and wraps 
around the structurally important calcium and 
manganese ions- In the known crystal structures, 
five different loop sizes (12-16 residues) adopting 
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Figure 4. Architecture of the carbohydrate-binding site- (left) Stereo view of a OPK model of PAL with the four 
stretches that constitute the carbohydrate-binding site coloured; blue fox stretch A containing ds-Asp86, orange for 
loop B containing Clyl06, yellow for metal-binding loop C and green for specificity loop D- These colours are similar 
to those used in Figure 1 and are maintained In all the other Figures. Superimposed is the ball-and-stick model of a 
virtual tetrasaccharide Man(al-2)Man(ttl^)[Man(al-3)]Man indicating the different subsites (M, primary binding site 
and - 1, -hi and +2 downstream and upstream subsites. (right) Stereo MOLSCRIFT representation showing the five 
stretches A-E that make up the carbohydrate-binding site. Side-chains of residues important for carbohydrate-binding 
are shown in ball-and stick and are labelled. The calcium and manganese ions are shown as large green and yellow 
spheres, respectively. 



12 residues 13 residues 14 residues 15 residues 16 i readies 

WBA-A WBA—B D858 DBL CONA FHL LOL UEA-I MAL PAL 

PHA-L RPA GSHV PNA UEA-II 

(a) ECorL WL-B4 GS1-B4 

9 residues 10 residues 11 reddues 12 residues _ 13 residues 1/ u g3ldue3 
GS-IV Mowose PNA UEA-l MAL UEA-ll ECorL DBL D858 ( WBAB 

CONA FRtL GSI-B4 W8A-A RPA PHA-L 

(b) LOL PAL WL-84 

Figure 5. Loop conformations in the carbohydrate-binding sites of legume lectins, (a) conformations observed for 
loop C and (b) conformations observed in loop D suggesting the use of canonical loop conformations to modulate 
carbohydrate specificity in a way similar to that seen In the CDR loops of antibodies. 
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Figure 6. Interaction of mannose in the primary binding site. Stereo view of the superposition of the monosac- 
charide-binding sites of PAL complexes with MeaMan coloured and MeaGlc light grey. Selected residues are labelled. 
The complexes are structurally identical with the exception of the orientation oFthe 02 atom of the sugars. The differ- 
ence in conformation for Glu22l is most likely due to the less well defined electron density of this residue In the MeaG 
structure and is probably not of biological relevance. 



a total of eight different conformations are 
observed (Figure 5(a)). There is no simple relation- 
ship between monosaccharide specificity and the 
length and conformation of loop C. In PAL, this 
loop has a length of 16 residues and is identical 
with that seen in the two sialyllactose-specific 
lectins from M- amurensis 1 * and in the chitobiose- 
specific lectin H from common gorse {Ulex 
europaeus)™ The other Man/Glc-specific lectins 
for which crystal structures are available have 
loop lengths of either 14 (FRIL, ConA and 
related lectins) or 15 (LOL and related lectins) 
residues. While the backbone conformation of 
loop C is not a determinant for monosaccharide 
specificity, specific side-chains on this loop never- 
theless do influence the nature of the sugar 
that can be accommodated in the binding site 
(see below). 

In contrast to stretches A-C, stretch D does not 
interact directly with the structural calcium ion. 
It is highly variable in length, conformation and 
sequence and is often referred to as the mono- 
saccharide specificity loop. 3 It is thought to be the 
prime determinant for monosaccharide as well as 
oligosaccharide specificity. In the PAL structure, 
the conformation adopted by this loop of ten resi- 
dues is identical with that found in all other 
known crystal structures of Man/Glc-specific 
lectins 15 - 17 '"- 26 (Figure 5(b)), supporting this 
notion. 

Finally, stretch E is found to interact with a 
bound carbohydrate in only a few cases, where it 
is part of the - 1 subsite: M- amursnsis leuckoag- 
glutinin (MAL) in complex with sialyllactose/ 9 
Griffbnia simptieifblia lectin IV (GS-W) in com- 
plex with the Le b tetrasaccharide 2 * and ConA in 
complex with Manfcl-^Man. 30 In most com- 
plexes of PAL, it is not involved in carbohydrate- 
binding, the exception being the Man(al-2)Man 
complex. 



Interactions in the primary binding site 

Methyl-a-D-mannopyrannoside (MeaMan) binds 
to the primary binding site in the same way as has 
been observed repeatedly for other Man/Glc- 
specific lectins (Figure 6).™^-** The sugar inter- 
acts with the protein via a series of hydrogen 
bonds with the conserved Asp/Gly/Asn triad as 
well as with the backbone of specificity loop D 
(hydrogen bond between Glu221 and 05). 
In addition, the side-chain of Fhel32 from the 
metal-binding loop C stacks favourably upon the 
sugar ring, while Gly220 and the side-chain of 
Glu221 also make favourable van der Waals con- 
tacts with the sugar. 

As observed for most other Glc/Man-spedfic 
legume lectins, PAL has twice the affinity for Mea- 
Man (our unpublished results) than for MeaGlc. 
In our crystal structure, the axial 02 of mannose 
makes van der Waals contacts with the C a atoms 
of Glyl06 and Gly220, possibly forming CH- - O 
hydrogen bonds (C-O distances 3.7 A and 33 A, 
respectively). This interaction is absent from the 
MeaGlc complex** due to the equatorial orientation 
of 02 in this sugar. As a consequence, a small void 
is present that is not filled with an ordered water 
molecule. 

Non-specific subsite interactions; Man(a1-3>- 
Man, Man(u1-4)Man and Man(a1-6)Man 

The affinities of PAL for Man(al-3)Man, Man- 
(al-4)Man and Man(al-6)Man are essentially iden- 
tical with that for MeaMan (our unpublished 
results). Clear density for the disaccharides has 
been observed in both subunits. The conformation 
adopted by the disaccharide is in each case identi- 
cal for both subunits, despite the involvement of 
the binding site of subunit A in crystal packing. 
All bind with their non-reducing mannose in the 
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(a) N K 




Figure 7. Recognition of Man(al-3)Man and Man(al-6)Man by PAL. (a) Stereo view of the PAL-binding site in 
complex with Man(ccl-3)Man- The disaccharide is shown in blue and loops of PAL are coloured as in Figure 2. 
The equivalent loops and side-chains of the ConA-Man(«l-3)Man complex are shown in light grey. The ConA-bound 
disaccharide is shown in black, (b) A similar stereo view of the PAL-Man(al-6)Man complex superimposed on the 
ConA-Man(al-6)Man complex. 



primary binding site, thus occupying siibsite +1. 
The interactions with this subsite are sufficiently 
favourable to allow each of the disaccharides to 
adopt a unique conformation, but insufficient to 
result in an enhanced affinity. All three disacchar- 
ides make at least one additional hydrogen bond 
with amino acid residues in the +1 site. This situa- 
tion is similar to what has been observed for 
ConA, 31 where Man(al-6)Man binds with the 
same affinity as MeaMan, despite an additional 
hydrogen bond from the protein to the reducing 
mannose. 

Figure 7 compares the binding modes of 
Man(al-3)Man and Man(ctl-6)Man with those 
observed previously in their complexes with 
ConA.* 1 Man(al^3)Man binds in an identical con- 
formation and orientation to ConA and PAL, 



although the residues that make up subsite +1 dif- 
fer in both proteins (Figure 7(a)), Man(al^6)Man 
recognition on the other hand is different in both 
proteins, and the disaccharides adopt different 
low-energy conformations (Figure 7(b)). The con- 
formation found in ConA cannot be adopted in 
the PAL structure due to dashes with the metal- 
binding loop C, notably the side-chains of Alal34 
and Aspl36. The conformation found in PAL 
could, on the other hand, fit in the binding site of 
ConA, provided that the a-methyl group on Ol 
would be reoriented, but no hydrogen bond 
would be formed in the +1 site. 

Specificity for a versus p linkages 

Similar to other Man/Glc-specific legume lectins, 



PAGE 30/63 ' RCVD AT 5/11/2006 5:45:19 PM [Eastern Daylight Time] * SVR:USPTO€FXRF-5/22 " DNIS:2730960 * CSID:202 371 2540 * DURATION (mm-ss):36-32 



MAY-1 1-2006 18:05 
1234 



SKGF 



202 371 2540 P. 
Pterocarpus Lectin Structure 





Figure 8, Binding of Man(al-4)Man by PAL. (a) A stereo view of the PAL-binding site in complex with Man(al-4) 
Man (blue), loops ate coloured as in Figure 2. In black, the co-ordinates of GlcNAc(pi-4)GlcNAc as seen in the binding 
site of UEA-H are superimposed. In PAL the p anomeric configuration is made impossible due to residues Qu221 and 
Gln222 in the specificity loop of PAL. (b) A stereo view of the UEA-II-binding site in complex with GlcNAc@l-4) 
GlcNAc (blue). In black, the co-ordinates of Man(<tl-4)Man as seen in the binding site of PAL are superimposed. 
In UEA-IL the a anomeric configuration is made impossible due to the bulky side-chain of 1>t135, which ts replaced 
bySerl37inPAL. 



the R angolensis lectin has a strict requirement for 
a-linkages. The chitobiose-specific lectin II from 
U. eurvpaeus (UEA-II), an the other hand, can 
accommodate only a p-linkage. From the different 
crystal structures of UEA-n-carbohydrate com- 
plexes/ 9 it has been learned that the N-acetyl 
groups of chitobiose are not crucial for binding 
and that its primary binding site can accommodate 
mannose and glucose as efficiently. 

Figure 8 compares the binding of Man(al-4)Man 
on PAL to that of GlcNAc(31-4)GlcNAc (chito- 



biose) to UEA-H. The bulky iyrl35 in metal-bind- 
ing loop C prevents formation of an ct-linkage in 
the binding site of UEA-II and is replaced by the 
small Serl37 in PAL. Otherwise, the backbone con- 
formations of this loop are very similar in PAL and 
UEA-IL On the other hand/ Ghi221 in specificity 
loop D of PAL sterically prevents a ^linkage from 
being accommodated in the binding site of this 
protein. In order to allow for a p-linkage, Glu221 
needs to be at least truncated to Gly. In UEA-II, 
the specificity loop is two residues longer and 
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Figure 9- Recognition of Man(cd-2)Man by PAL. (a) A stereo view of the PAL-Man(al-2)Man complex superposed 
on the equivalent ConA complex (non-reducing mannose bound in subsite -1). Colouring is as in Figure 3. (b) An 
identical view of PAL, but superimposed on the ConA-Man(otl-2)Man complex with the second mannose in subsite 
+1. 



locally adopts a different conformation, allowing 
for a ^-linked disaccharide to be bound. Thus, the 
selection for <x or P-glycosidic linkages appears to 
be controlled by a few specific amino acid residues 
and does not depend critically on the type of 
metal-binding loop, as was suggested earlier." 

Man(a1-2)Man binds with Its reducing 
mannose in the primary binding site 

In a typical Man/Glc-specifk legume lectin, 02 
of a bound MeoMan points towards the solvent 
This means that, at least in theory, Man(al-2)Man 
would be able to bind with both its reducing or its 
non-reducing mannose in the primary binding 
site. Such a situation is observed in the complex of 
concanavalin A (ConA) with Man(al-2)Man/° and 
has been argued to contribute significantly to the 



enhanced affinity of this lectin for Man(al-2)Man 
compared to MeaMan.* 2 

PAL has a 2.5-fold higher affinity for Man(ctl-2)* 
Man compared to MeoMan (our unpublished 
results). Nevertheless, in the current Man(otl-2)- 
Man complex only a single binding mode is 
observed, with the reducing mannose molecule in 
the primary binding site and the second mannose 
molecule in subsite - 1- The second binding mode 
in the +1 subsite is not possible in PAL, as it 
would lead to a steric dash with the two residues 
longer metal-binding loop C, in particular with 
the side-chains of Alai34, Aspl36 and Serl37 
(Figure 9). 

In the observed binding mode to PAL, the con- 
formation of the disaccharide roughly resembles 
the equivalent one found in the ConA complex, 
but is not identical. The difference in conformation 
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is due to a single amino acid substitution in loop B: 
Glyl04 in PAL is replaced by Thr226 in ConA, for- 
cing a 30° rotation around Phi (defined as 
the torsion angle 05-C1-01-C2) to prevent a 
van der Waals clash and to establish a hydrogen 
bond between Ol of Man(-1) and Thr226(OGl) 
(Figure 9). 

Context-dependent protein-carbohydrate 
interactions In the Man(a1-3)[Man(a1-6)]- 
Man complex 

Man(al-3)(Man(al-6)lMan is the only carbo- 
hydrate for which there is a significant difference 
in binding in the two crystallographically inde- 
pendent lectin monomers. In the binding site of 



monomer B, which is not involved in crystal pack- 
ing, dear density is seen for the whole trisacchar- 
ide. Outside the primary binding site, five direct 
and one water-mediated hydrogen bonds are 
made between the protein and the carbohydrate. 

In the binding site of monomer A, on the other 
hand, electron density is observed only for the 
mannose in the primary binding site. As there is 
no trace of density corresponding to the original 
Man(al-3)Man (that was present before the soak 
with the trimannose)/ it has to be assumed that 
the trimannose is indeed bound in the binding 
site* The obviously favourable conformation of the 
trisaccharide as observed bound to monomer B is 
not accessible in monomer A because of severe 
steric conflicts with a symmetry-related protein 
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Table L X-ray data collection an d refinement statistics 
Carbohydrate 



Unit cell parameters 

fl(A) 

MA) 

e(A) 
Beamline 
Rfisolution(A) 

tfmeis 

NunJquc 

Completeness (%) 



<W0> 
R 

PDBcode 



MeaMan Man(aI -2)Man Man(oil^)Man Man(otl-4)Man Manfrl^Man Man(al-3)[Man(aU)3Mgi 

56.05 
S3-$l 
122.96 
BW7A 

UBS 

50,664 
99.7 
0.081 
17.00 
0.1805 
0.2052 

lqSv 



56.62 


5632 


56.88 


57.26 


56.68 


83,67 


82.96 


83-03 


6337 


63.38 


123.00 


122.00 


122.94 


122.96 


12237 


BW7A 


X3l 


BVV7B 


X31 


XU 


1-70 


220 


1.75 


2-05 


2.05 


210,120 


101,201 


395,205 


209,468 


160,081 


64,245 


25,266 


57,268 


36,449 


37,147 


97.6 


94.9 


96-6 


99.9 


975 


0.051 


0102 


0.177 


0-U9 


0.111 


19-20 


8.58 


13.05 


11-67 


9.79 


0-1839 


0,1776 


0.1837 


0.1641 


0.1624 


02154 


0.2176 


0.2064 


02179 


0.2161 


lukg 


lq8o 


lq8p 


lq6q 


lqSs 



molecule. Apparently, when specific subsite 
interactions cannot be made/ the trisaccharide is 
bound in a disordered way, similar to that 
observed in the complex of pea lectin with the 
same trisaccharide. 23 

Concanavalin A has a 40 times higher affinity 
for the trisaccharide than for MeaMan. W4 PAL 
favours the trisaccharide by only a factor of four 
(our unpublished results). Figure 10 compares tri- 
rnannose-binding by ConA 35 ^ and monomer B of 
PAL. Although there are many differences for the 
amino acid residues outside the primary binding 
site, some of the global features of the binding of 
the trisaccharide are remarkably similar for both 
lectins. In both cases, the non-reducing mannose 
that is linked to 06 is found in the primary binding 
site. The "middle" (reducing) mannose makes 
extensive interactions with the protein. These inter- 
actions require, however, the presence of the third 
mannose molecule (non-reducing 03-linked). The 
details of the binding, such as specific hydrogen 
bonds formed outside the primary binding site, 
nevertheless differ markedly for ConA and PAL. 



Materials and Methods 

Purified P. angolensis seed lectin (PAL) was available 
from a previous study. 22 Internal amino add sequences 
were obtained from four peptides after digestion of the 
purified P. angolensis lectin with trypsin. The amino acid 
sequences of these peptides (SPLSNGADGIAFFIA, and 
WDPDYPH) were used to design degenerate primers. 
Total RNA was extracted from ripening P. angolensis 
seeds (collected in Zimbabwe and stored at -80°C) 
using the RNeasy Plant Mini Kit (Qiagen) and converted 
into double-stranded cDNA using the 5'RACE System 
for rapid amplification of cDNA ends (GibcoBRL) and 
the Or primer. 57 This cDNA was amplified with the 
degenerate forward primer Muk-2: 

5 r -CClGCICAYGGIATHGCITTYTT-3'' 

and the reverse primer Muk-5: 

^ACCCTRGGICTRATRCGICT-S' 

After amplification, the PCR fragments were cloned 



into the pUCl$ using the Sure-Oone-Iigation kit 
(Amersham Pharmacia Biotech), and sequenced. 
Specific primers Muk-6; 

5 / <^VTCGCACCGCCGGATACTAC-3' 

Muk-6: 

3^AGTITrGTGCACX;CCCiTGG-5' 

and Muk-9: 

3 / -GATCCCGAGAAACGTGGACC-5' 

were designed on the basis of the sequence information 
gathered from the PCR fragment obtained with the 
degenerate primers Muk-2 and Muk-5, to amplify and 
clone the nucleotide sequence corresponding to the 3'- 
end and the ff-end of the P. angolensis mRNA. 

The S'-end of the cDNA was amplified with the Muk-6 
and the Qo primer.* 7 The S'-end of the P. angolensis 
mRNA was amplified after adding a poly(Q tail to the 
S'-end of the cDNAs (GibcoBRL). An initial PCR amplifi- 
cation was carried out with Muk-8 and RAAP (Gib- 
coBRL), followed by a second PCR amplification using 
the nested primers Muk-9 and AUAP (GibcoBRL). The 
PCR fragments were cloned into the pUC18 using the 
Sure-Clone-Ligation kit (Amersham Pharmacia Biotech) 
and sequenced- 

Full length cDNAs were subsequently amplified using 
primers Muk-32: 

S'-CCCAAATATAATAAAAAGCGCTACCCA- 
TCTC^ 

(located in the S^un translated sequence) Muk-14: 

5'^CTCCCTCIXXTTC^ 
(located in the signal peptide) or Muk-15: 

S'-ATGCTACTGAACAAAGCATACT-^ 

(located in the signal peptide) in combination with the 
Qd primer. 57 The PCR fragments were cloned in pUC18 
and sequenced. 

Sequencing was performed following the dideoxy- 
nucleotide termination method of Sanger et a/. 3 * using 
the Thermo Sequenase Radiolabeled Termination Cycle 
Sequencing kit (Amersham Pharmacia Biotech). Sepa- 
ration of the resulting DNA fragments was performed 
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on a 5% (w/v) polyacrylamide gel containing 8 M urea, 
and Tris-borate buffer (100 mM Tris, 100 mM boric acid). 
Detection of the sequencing ladders was achieved by 
autoradiography. 

Crystallisation and data collection 

All carbohydrates were purchased from Dextra 
Laboratories (Reading, UK) and are methylated on their 
reducing Ol, even when not specifically mentioned 
otherwise. Crystals of PAL in complex with Man(cel- 
3)Man were prepared as described for the methyl-n-D- 
glucose complex, 12 with the exception that 10 mM 
Man(ol-3)Man was used as a ligand instead of the gluco- 
side- The complexes of the lectin with the other carbo- 
hydrates given in Table 1 were obtained by transferring 
the crystals of the Man(al-3)Man complex to artificial 
mother liquor (200 mM calcium acetate, 100 mM sodium 
cacodylare (pH 65), 20% (w/v) PEG8000) containing 
increasing concentrations of the desired ligand (100 mM 
final concentration, reached in four steps). 

All X-ray data were collected at room temperature 
on the EMBL beamlines of the D£SY synchrotron 
(Hamburg, Germany). The data were processed with 
DENZO and SCALEPACK. 39 The statistics of the data 
collections are given in Table 1- 

Structure determination 

The structure of the Man(<xl-3)Man complex was 
determined by molecular replacement using the co-ordi- 
nates of lentil lectio (pdb code lien*) as a search model 
Two clear solutions were found with AMoRe 41 mat 
together constructed the lectin dimer. Refinement was 
carried out using the mlf target of CNS 1.0," Cross- 
validation, bulk solvent correction and anisotropic tem- 
perature factor scaling were used throughout- Rounds 
of slow-cool simulated annealing and restrained B-factor 
refinement using all available data were alternated with 
manual fitting in electron density maps using TURBO. 43 
At the beginning of the study, the amino acid sequence 
was still unknown and a sequence was derived directly 
from the electron density maps. At the end of the refine- 
ment, the cDNA sequence became available, and this 
information was used to finalise the structure. At this 
stage, simulated annealing was abandoned in favour of 
conventional positional refinement and water molecules 
were fit into the electron density. 

The structures of all other isomorphous lectin - carbo- 
hydrate complexes were determined using the refined 
co-ordinates of the Man(al-3)Man complex (stripped 
from its carbohydrate ligands and water molecules) as 
the starting modeL After rigid body refinement a slow- 
cool stage was used to uncouple R and Rfce* From then 
OA/ restrained positional and B-factor refinement were 
alternated with manual fitting in electron density maps. 
The refinement statistics for all complexes are given in 
Table 1. 

Superpositions Of crystal structures were done 
using TURBO- 43 Figures 2-10 were produced using 
MOLSCRPT 44 and RastetfD. 45 

Data Bank accession numbers 

The nucleotide sequences of partial and full-length 
cDNAs have been deposited at GenBank with accession 
numbers from AJ426054 to AJ426062. 

Co-ordinates and structure factors were deposited at 



the RCSB Protein Data Bank with as entries lukg, lq8o, 
lq8p, lqdq, lq8s and lq$v. 
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Knowledge of structural reutionships In proteins Is increas- 
ingly proving very useful for in silko characterizations 
and is also being arplotted as a prelude to almost every 
investigation tit fnndtonal and structural genomics, A 
thorough uiderstaDdlng of the crucial features of a fold 
becomes necessary to realize the full potential of such 
relationships. To illustrate this, structures containing the 
legume lecthi-Uke Cold were chosen for a detailed analysis 
since they exhibit a total tack of sequence similarity among 
themselves and also belong to diverse functional families. 
A comparative analysis of 15 different families containing 
this fold was therefore carried out, which led to the 
determination of the minimal structural principles or the 
determining region of the fold. A critical evaluation of 
the structural features, such us the curvature of the front 
sheet, the presence Of the hydrophobic cores and the 
binding site loops, suggests that none of diem are crucial 
for either the formation or the stability of the fold, but are 
reqnlred to generate diversity and spedfldcy to particular 
carbohydrates. In contrast* the presence of the three sheets 
in a particular geometry and also their topological connect- 
ivities seem to be important Hie fold has been shown to 
tolerate different types of protdn-proiein associations, 
most of them exhibiting different types of quaternary 
associations and some even existing as complexes with 
other folds. The function of every family In this study is 
discussed with respect to its fold, leading to me suggestion 
that this fold can be linked to carbohydrate recognition 
ingeneraL 

Keywonfc carbohydrate bindmg/p-sandwich/BUuctural 
dete*Tuinants/structiiral relationships 



Introduction 

Examination of the hitherto identified protein folds reveals 
that the available protein structures cluster into limited regions 
of the entire conformational space (Holm and Sander, 1995). 
This means that several protein families share a common 
structural fold, some of which are obvious from the sequence 
similarities that they exhibit. It is also well known that protein 
evolution gives rise to families of structurally related proteins, 
within which sequence similarities can be extremely low. Such 
unanticipated relationships in known structures have been 
identified effectively by structure-based classifications (Holm 
and Sander. 19961. Several excellent databases featuring struc- 
tural classification* of protein structures have been developed 
in recent years [SCOP (Murzin etaL, 1995), FSSP (Holm and 
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Sander, 1996), CAXH (Orengo et al % 1997)]. These databases 
serve as useful guidelines to study die overall folds and 
structures of various proteins encoded by a genome. Tb realize 
the full potential of these relationships, it is essential to 
characterize the structural determinants or the minimal struc- 
tural principles of the individual folds, the completion of 
several genome sequences including that of the human genome 
provides an additional impetus to such characterizations. Struc- 
tures containing the legume lectio-like fold are so diverse in 
the families to which they belong that they provide classical 
examples for investigations of this type. 

Lectins are coibohydrato-binding proteins that specifically 
recognize diverse sugar structures and mediate a variety 
of biological processes such as cell-cell and host-pathogen 
interactions, serum glycoprotein turnover and innate immune 
responses (Vjjayan and Chandra, 1999). Lectins are round in 
most organisms, ranging from viruses and bacteria to plants 
and animals (Us and Sharon, 1996). They represent a hetero- 
geneous group of oligomcric proteins that vary widely in size, 
structure, molecular organization and the constitution of their 
combining sites. Nonetheless, many of them belong to distinct 
protein families, classified based on biochernkal* functional 
or structural properties. Although a number of lectins have 
been well studied, ambiguities still exist in their precise 
biological roles. A well-Studied class of lectins from legumin- 
ous plants contain a characteristic fold (Loris et aL, 1998; 
Bouckaert et oi, 1999), often referred to as the legume lectin 
fold or simply the lectin fold. This fold is one of the widely 
occurring protein folds represented by 14 distinct protein 
families, in addition to legume lectins. The most striking 
feature of the fold is the total lack of sequence similarity 
among different members exhibiting the fold. It is remarkable 
indeed that different members exhibiting this fold can show 
as low as 2% sequence identity. The recent addition of the 
structures of several legume lectins and also many other 
proteins possessing the fold (Bennan at aL. 2000; Bottler 
et aL. 2001) enables us to analyse and compare these various 
structures in order to characterize the structural features of 
mis fold. Here, we seek to compare the structures of the 15 
different families and derive common structural features and 
determinants of the fold. In an attempt to relate fold to function, 
we also analyse features required for carbohydrate recognition, 
which happens to be the best recogDited function of members 
with this fold.- 

Methods 

Initial identification of proteins containing the legume lectin- 
like fold was made using the SCOP (Murzin et ol. 1995) and 
FSSP databases (Holm and Sander, 1996), which was followed 
by an analysis of related proteins in the CATH (Orengo et aL, 
1997) and the 3D lectin databases (Betller et dJL, 2001). Further, 
to identify structural homologues, a thorough investigation of 
all available protein structures In the protein data bank was 
carried out using two separate structure comparison algorithms, 
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DAU (Holm and Santa; 1995) and VAST (Gabrat et ol % 
j 996). AU the Identified proteins were analysed for particular 
features and re-classified baaed broadly on their known func- 
tions. Only those proteins which hod a Z-score >2.5, for at 
least 70% of the contents of the relevant domain, from the 
DAU comparisons arc included here. This study involved an 
analysis of more than 300 structure*, which were obtained 
from our local repository of coordinates, regularly downloaded 
from me Protein Data Bank (Barman et oi, 2000). The MSI 
software package (JnsighvX version 98) was used co visualize, 
analyse and manipulate various structures. The solvent 
accessibility calculations were carried out using the Connolly 
algorithm (Connolly, 1993). To deduce topology, sequential 
connectivity of the individual strands in the three P-shcets 
were considered! The curvature of the sheet was computed by 
measuring virtual angles subtended by the end Cot atoms at 
(heir mid-points. The average distance between two sheets 
was computed by measuring the distance between the centroids 
of each sheet using only the Ca atoms of each sheet Hydro- 
phobic cores were considered present, when a minimum of 
four residues, that had an overall accessibility of <10%, were 
in contact within a radius of 5 A. of each other, forming a cluster. 

Results and discussion 
General description of the fold 

The structure of concanavalln A (ConA), the first legume 
lectio to be X-ray analysed. In 1972, exhibited a p-sandwich 
type Of structure (Hantaan and Ainsworth, 1972). Subse- 
quently, about 100 mora structures involving more than 15 
different legume lectins in their complexed and unoomplexed 
forms have been studied. All of them share the same tertiary 
Structure with that of ConA in their individual subunits. These 
include about 10 structures of peanut and winged bean basic 
and acidic lectins from our laboratory (Banerjee et at, 1994, 
1996; Prabn et oi t 1998; Manoj et aL t 2000). A structure- 
based classification of proteins places them in a super^family 
of ConA-like lectins and glucanases, one of the many in the 
all-p structural class. The subunits of legume lectins are most 
often made up of single polypeptide chains of -250 amino 
acids exhibiting the legume lectin fold. The fold primarily 
consists of three p-sheete, a 'flat' six-mcmbered 'back' 
p-fiheet, a small 'top' p-sheet and a curved, seven-stranded 
'front 1 p-sheet and a number of Loops intciconnecting the 
sheets as well as the strands in them (Banerjee et n/., 1996). 
In peanut lectin, for example, 110 out of 228 residues have 
[^structures; the remaining are in loops and (Hums, connecting 
the strands. Legume lectins within themselves exhibit remark- 
able sequence homologies and structural similarities, despite 
difference* in sugar specificities and quaternary structures. 
Superpositions of Ca atoms of the p-sheets of individual 
subunits of legume lectins using various combinations results 
in root mean square deviation (r.m.s.d.) values ranging from 
about 0.6 to 2.0 A. 

Analysis of structures in the Protein Data Bank reveals that 
there are many other families of proteins which exhibit the 
same legume lectin subunit fold. Structural homologues were 
identified using DALI (Holm and Sander, 1995) and VAST 
algorithms (Gibrat ei al % 1996), compared with the SCOP 
and FSSP (latahum* and reclassified based on their known 
functions, as shown in Tablo L The first legume lectin to be 
X-ray analysed, ConA, is somewhat atypical as the post- 
translations! modification involving a circular permutation 
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results in a mature protein with amino and carboxyl termini 
at locations different from those in all other legume lectins of 
known three-dimensional structure. Therefore, instead of 
ConA. a subunit of tetrameric peanut lectin (2PEL:AX another 
thoroughly characterized lectin, will be used as a representative 
of legume lectins in the present study The highest resolution 
sttucture of either die carbohydrate complex where available 
or of the apo protein from each of the other families was 
chosen as a representative structure for further analysis. The 
representative structures chosen are listed in Table I and 
illustrated in Figure 1. Tumour necrosis factowx and the viral 
capsid proteins were also among the structural homologues of 
peanut lectin. They were not considered in this Study! however, 
since their similarities lie below the cutoff criteria chosen hero. 

The terms p-sandwich fold and the jelly roll fold have also 
been used in the literature to describe these proteins. Whereas 
the former term is technically correct, since the legume lectin 
fold U a type of p-sandwich, use of the latter term is debatable 
as the legume lectin fold does not strictly conform to the 
definition of a jelly roll (Cbelvanayagam et aL, 1992). There 
ore minor variations in topology among the different families 
of proteins exhibiting this fold. Therefore, strictly, they may 
be described as belonging to a set of closely related folds. 
However, as explained later, the differences in topology are 
very small and the lerm legume lectin fold will be used to 
encompass all of them. 

Characteristic features of the legume lectin fold 
The two main frshttts, their relative orientation and the third 
P-sheeL Superposition of die P-strands in all structures with 
those in the first subunit of peanut lectin reveals that all of them 
have two P-sheets with a roughly similar mutual orientation, 
R.m.&ds, number of residues aligned and the structural similar- 
ity score are shown in Table L The 'back' sheet ranged between 
four and six strands with seven residues in each strand, in all 
the structures, while the front sheet showed 5-7 strands with 
5-7 residues in each strand. This variation was restricted only 
to the first two strandB in the back sheet and the first three 
strands in the 'front' sheet. Strand regions corresponding to 
residues 64-70. 162-166, 173-179 and 186-192 from the back 
sheet and residues 84-90, 117-124, 136-143 and 149-153 of 
the peanut lectin structure can therefore be said to be invariant. 
Bovine spermadhesin and the insectieidal toxin, however, 
exhibit minor variations on this theme, since only those which 
correspond to the middle strands in both sheets in peanut lectin 
are present. It must be remembered, however, that these two 
families deviate the most from the typical legume lectin fold. 

The two sheets are approximately parallel to each other and 
also situated at a distance averaging -13 A between the back 
and the front sheets as measured by computing the distances 
between the centroids of the Cot atoms of each sheet. The 
presence of the 'back 1 and the 'front* sheets and similarities 
in their relative orientation, therefore, clearly appears to be a 
characteristic of the legume lectin fold. Several hydrophobic 
residues present on both sheets have their side chains positioned 
between the sheets so as to form a hydrophobic cluster, which 
provides an important source of stability for maintaining the 
fold. A prominent feature of this hydrophobic cluster is the 
aromatic side chains of one sheet stacking against those of the 
other sheet The observed distance of -13 A, can be justified 
in terms of the optimal distance requiitd for such interactions * 

The third, small 'top* anti-parallel sheet made of five strands 
with only 2-4 residues in each strand exists in all legume 
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lectins. The strands correspond to residues 25-27. 3 1-34, 217— 
220, 71-74 and 160-161 in peanut lectin (Banerjee «r <xt» 
1996). This sheet Is not explicitly acknowledged as a separate 
0-fiheei in most of the other structures. However, upon examina- 
tion of their backbone dihedral angles and hydrogen bonding 
patterns, all the IS representative structures described here 
were found to contain segments corresponding to this sheet in 
an orientation analogous to that observed in peanut lectio. As 
in the case of die *back' and the 'front* sheets, here coo only 
the last three strands corresponding to residues 217-220. 71- 
74 and 160-161 woe invariant A schematic diagram of the 
fold and a stereo view of the ribbon diagram of a subunit of 
peanut lectin are shown in Figure Z 
Concavity of the front sheet. The front sheet is curved in ail 
legume lectins. In order to determine how important this 



curvature was far determining the legume lectin fold, the 
extent of curvature in all IS structures was analysed by 
measuring the distances between the ends of the two middle 
strands and the virtual angles the ends subtend at the mid- 
points of the strands. The eiuNlistances and virtual angles 
average -22 A and -150°, respectively for galectlns, Charcot 
Leyden crystal protein, spcrmadhesins and the i n se ct iddal 
toxin, while the average values for legume lectins, arceliru, 
cellobiohydrolaae, gluconates, xylsnases, ncurexins and tetanus 
neurotoxin are -19 A and -120'. The pentraxins show values 
between ihese two types, with one of the strands showing 
more curvature than the other. These calculations confirm that 
whereas the front sheet is almost flat in some proteins, h is 
significantly curved in most others, giving rise to a concave 
surface, suggesting that the fold can tolerate considerable 
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variation in this parameter (Table II). It also suggests thai 
the curvature of the front sheet is not too critical for either 
the formation or the stability of the overall fold. 
Hydrophobic cores and surface loops. In addition to the first 
hydrophobic core between the two main Sheets, present in 
all structures, a second hydrophobic patch has also been 
observed in some structures such as peanut lectin and cellobi- 
ohydrolsse. Upon examination of these structures, it appears 
that along with the spatial disposition of long loops connecting 
the strands, the curvature of the front sheet is responsible for 
this. Indeed, only those structures that bad 4 curved front sheet 
and Urge loops connecting the strands in it exhibited a second 
hydrophobic patch and those structures with either a nearly 
flat front sheet or with curved sheets with short loops did not 
exhibit this phenomenon (Table II). The solvent-accessible 
surface areas computed for various structures confirm the 
presence of the first hydrophobic patch In all structures and 
the presence of the second patch in some, correlating well 
with the curvature of the front sheet and the presence of large 
loops. It therefore appears that the second hydrophobic patch 
is not crucial for the fold. Each family showed a number of 
varying loops in their member structures and in many cases 
loops played an important role in carbohydrate binding, e.g. 
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the four large loops in legume lectins. Galoctlns, on me other 
hand, demonstrate that the loops can be rather small, as email 
as just two residues, and yet be capable of carbohydrate 
binding, as discussed in a later section. Again, this serves to 
prove that the loops do not play a critical role for the formation, 
stability or function of the fold but may give rise to specificity 
or differences in affinity for various oligosaccharides* An 
examination of the distribution of charged residues on the 
structures reveals that in most cases they are clustered on the 
front sheet irrespective of its extent of curvature. Charged 
residues were found in some structures either on the back 
sheet or on the top sheet, but were often explainable in terms 
of their quaternary associations. 

Tbpohgy, The legume lectin fold has three p-sheets, as has 
been described in the previous section, described in SCOP as 
exhibiting complex topology. The fold contains primarily a £- 
sandwich made of strands (1, 18, G, 13, 14, 15K2> 5, 16, 8. 
9, 10, 11) with a lid made of strands (3, 4, 17, 7, 12) where 
each number denotes that assigned to me strand based on their 
position in the sequence, The pairs formed within the fold 
are all andparallei and made of strands 1-18, 2-5, 3-4, 4-17, 
5-16, 6-13, $-10, 7-12,7-17, g-9, g-16, 9-10, 10-11, 13-14 
and 14-15. The connectivities of the strands in the three sheets 
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were identical, except for the small differences indicated 
below, among the supeiposable ports of all the representative 
structures, suggesting that topology is an important character- 
istic feature of die fold. Figure 3 depicts the topological 
connections in peanut lectin in a schematic diagram. The figure 
also illustrates how tins topology is substantially conserved in 
all the representative structures in spite of variations in the 
fold itself. The variations are primarily truncations of the 
sheets, insertions of additional segments or changes in the 
positions of the N- and C-termim in the sequence. A classical 
example is provided by ConA, where the topology of the 
sheets remains identical with that of peanut lectin except thai 
the N- and C-terrnin) are at different positions. The same 
phenomenon is observed in the case of both the pentraxins, 
where the termini are merely frame-shifted by three strands in 
the sequence. Feutraxins also have two o>helices, one between 
the first two strands of the back p-sheet and the other between 
the front and the top sheets. An insertion is observed in the 
wings of neuraminidase also, where an a-hclix is observed 
between the back and the front sheets, without any changes 
in either the position of the tcrroini or die topology. The 
structure of cellobiohydrolase presents a good example of an 
entire domain being inserted between the back and the front 
sheets and two smaller domains, pr^ominantly consisting of 
a-helicss, between the front and the top sheets. Xylanases 
reveal an insertion of additional strands in its front sheet 
leading to two additional {^-hairpins. Galcctins and the Charcot 
Leyden protein demonstrate yet another variation, that can be 
described as a truncation of the first four strands in the 
sequence resulting in one strand less in both the main sheets, 
as compared with the situation in peanut lectin. Spenrtadhesins 
and the insecticidat toxin show larger truncations within the 
back and the front sheets, but again the topologies of the 
existing strands remain similar. These observations suggest 
thai the fold can tolerate insertions and deletions such as those 
described, without any disturbance to its overall nature. 

Derivation of minimal structural determinants. Compilation 



of the residues in the superimposable segments in all structures 
reveals that they are all close together in their sequential 
positions, indeed clustered into one Long segment of the 
polypeptide^ corresponding to residues 64-192 of peanut lectin. 
This segment correspond* to a (^hairpin in the front sheet, a 
double hairpin of the back sheet and two short strands on the 
top sheet Which are basically extensions of two invariant 
strands in the back sheet Interestingly, all these regions are 
contained within one contiguous segment of the polypeptide 
chain in all the structures studied here. This segment, high- 
lighted in Figure 3, can be described as the minimal structural 
principle or as the determining region for the legume lectin 
fold. The minor topological variations observed in some 
proteins, especially with reference to the positions of their 
chain termini (e.g. in ConA or the pentraxinsX occur between 
me first and the second strands of the back sheet and do not 
alter the invariant region in (he fold, consistent with the 
hypothesis that the invariant region is indeed the tacrmlmng 
region. The only variation to this general consensus is observed 
in the structures of the insccticidal toxin and sperm adhesin 
where a segment corresponding to two strands in the back and 
front sheets within this determining region are deleted, but 
there too, the remaining strands of the invariant region are 
within one sequential segment. 
Qualtmnry structure 

Legume lectins themselves exhibit different types of quaternary 
structures not only in terms of the number of subunits involved 
in the oligornerle molecule but also in the nature or type 
of oligomerization, despite having nearly identical subunit 
structures. Hie dimerization of ConA and several other legume 
lectins involves the association of the two back ^sheets into 
a contiguous 1 2-stranded 3-sheet in a side-by«side arrangement, 
with die dyad of the dimer perpendicular to the p-sheet 
thus formed. That in peanut lectin, lectin IV of Grtffonia 

limpticijolint Eryihrina condlpdendmn lectin and Winged bean 
lectins involves the back-to-back arrangements of the two 
'back' ^sheets. All teirameHe legume lectins arc dimer* of 
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dimera, but here again considerable variability exists (Frabu 
cr dL f 1999). In this context, the most interesting case is 
presented by peanut lectin in which the tetnraeric mole- 
cule has an 'open* structure without (he expected 222 or 
4-fold symmetry (Bansrjee et aL } 1994). The CanA type of 
diinerfeation Is exhibited by galectins, sialidase, a member of 
the neuraminidase family, arcelin-1 end the lectin-llke inhibitor 
of amylase, while a back-u>back arrangement if found in 
the porcine plasma spennadhesin heterodiraer. The 

pentameric peitfraxina exhibit yet another type of oligorneriz- 
ation. The association of me A-E subunits appears rather weak 
although involving the back ^sheets; the association of the 
A-B subunits involves only (he top P-sheet and some loops 
around it, thus forming a new mode of dimerizadon. It is 
worth mentioning that the invariant region is not associated 
with quaternary association in any or the structures studied 
here. Figure 4 illustrates that structures containing the legume 
leetin-IilEB fold exist as monomers, different types of dimera, 
tetramers and also as domains of multi-domain proteins. 
Differences in quaternary associations have been shown to 
give rise to Oiffcrence* in oligosaccharide specificities in 
bulb lectins. (Chandra et al % 1999), although any such clear 
correlations have not been discovered so far for the proteins 
used In this 6tudy. 

Blolaglcal roU and carbohydrate binding 
The main criterion for a protein to be classified as a lectin is 
its ability to bind a carbohydrate, most often an oligosaccharide 
(Lis and Sharon, 1995) Lectins are known to mediate a 
variety of cellular interactions through their ability to bind 
carbohydrates specifically It is not surprising, therefore, that 
different lectins are specific to different carbohydrates. A study 
of the crystal structures of more than 50 legume lectin- 
eaibohydrate complexes has shown that the lectins bind the 
carbohydrates at the top of the concave Side of the front sheet 
and involve interac lions from the four loops (91-106, 125- 
135, 75-83 and 211-216). The first three loops art largely 
common to all legume lectio* whereas the fourth one varies 
in size and conformation and is thought to determine specificity 
of me lectin. The predominant function of legume lectins, all 
exhibiting the same fold, is merefore carbohydrate binding 
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(Sharma end Surolla, 1997). The obvious question* that this 
observation raises are whether the legume lectin-like fold is 
always involved in carbohydrate binding and whether the fold 
gives rise to that particular function. A brief description of the 
families of proteins in this study provides some insight into 
these questions. 

Galectjns, represented by 1SLC, are a family of soluble 
animal lectins that are cation dependent and hind to Gal- 
p(|,4)QlcNAc terminating oligosaccharides (Bourne ct al, 
1994). They have been implicated in modulation of cell-cell 
interactions through caibohydiatc-rnediated recognition. The 
crystal stnictnres of five galectins reported so far aU show that 
the carbohydrate binds on the front sheet The front sheet, 
however, is not curved in galcctins and the loops corresponding 
to (be four carbohydrate binding loops on peanut lectin are 
either very small or are not present Carbohydrate binding is 
achieved through interactions of the charged end polar residues 
on die front sheet, especially an aspartic acid, an aspatagine 
and an argininc It appears that the longer side chains seen on 
the front sheet compensate for the loss of the loop regions and 
succeed in binding the carbohydrate in a somewhat similar 
position. The human Charcot Leyden crystal protein, also 
referred to as galectin-lQ owing to its structural similarity to 
galcctins, is the major autocrystallizing constituent of human 
eosinophils and basophils during allergic inflammation and is 
known to possess lysophospholipase activity (Swanunadian 
ct aL 1999). The CLC protein structure possesses a carbo- 
hydrate recognition site comprising most of the binding 
residues that are conserved among galcctins. The protein 
exhibits specific although weak, binding to mannose, 
tf-acetylglucosantine and lactose. The binding site of mannose 
in the crystal structure of a complex is seen to be similar to 
the sugar binding site in galectins. 

Arcelin-1, a member of die phytohaemaggliitinin family, is 
a glycoprotein from kidney beans (Pfeira>ta vulgaris) which 
displays insecticidal properties and protects the seeds from 
piedation by larvae of various bruchids (Mourey ez al , 1998). 
This lectin-like protein, although devoid of monosaccharide 
binding properties, exhibits specificity for various glycopro- 
teins such as fetuin and asiajofttuin. The related protein 
arcelin-5 has a different quaternary structure, binds to monosac- 
charides specifically, on the concave side of the front sheet 
involving interactions similar to those observed in legume 
lectins (Hametryck et al t 19960. The differences in function 
between the two arcelins, due to the differences in their 
carbohydrate- binding properties, have been explained in terms 
of sequence and structural changes in the two proteins- The 
seeds of Phaieolus vulgaris contain another protein that inhibits 
a-amylase in the digestive tract of mammals and coleoptcra 
and the growth of burchid larvae (Bompard-Ciiles et al , 1996). 
The structure of this compound, taermined a few years ago, 
reveals a lecrin-like domain with a tertiary structure very 
similar to that of ConA. The carbohydrate-binding loops in 
this protein are truncated, facilitating its binding to amylase. 

FentraxmS are pentameric plasma glycoproteins character- 
ized by calaum-dependent ligand binding. Their overall 
structures have been described earlier to bo similar to that of 
legume lectins (Srinivasan et al t 1996). The loops within and 
between the ^sheets are much shorter in mis family of 
proteins. The human serum amyloid P component binds to 
4,6-cyclic pyruvate aoetal of p-o-galactose and all forms of 
amyloid fibrils through calcium ions (Ems ley et aL t 1994). 
Although the positions of calcium ions are different from the 
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metal Ion positions in legume lectins, the mode of recognition 
bean some it semblance in the two families* especially the 
role played by Glnl4* in the serum amyloid component, as 
compered with that of Asal27 in peanut lectin. Hie human 
C-ieactive protein, although belonging to the same structural 
class as mat of the serum amyloid component, is functionally 
a completely different protein (Shrive et aL t 1996). It is an 
acute phase reactant protein that is expressed rapidly as a 
response to infection or injury and is known to be involved in 
enhancement of phagocytosis and activation of the complement 
through its ability to bind to the bacterial polysaccharides. The 
tigand is expected to bind at the concave side of the front 
sheet In both members of the pcntraxln family. 

Many inkroOrgamfims produce multiple forma of different 
1 ,4-p-glyeosidases in order to hydrolyse plant polysaccharides 
such as cellulose and xylan. Because of the complex nature 
of these polysaccharides, different glycosidases are required 
to complete their hydrolysis. These glycosidases are classified 
into tndfh or ttt^enzymes depending on their ability to 
catalyse the backbone breakage of their polymeric subs irate 
(Wood, 1989). Microbial cellobiohydrolasc is a representative 
of the exoglucanase/ceUulase family, evolved to carry out the 
hydrolysis of cellulose, the major polysaccharide in plants 
(Divne et al, 1994). lis crystal structure shows that this protein 
specifically binds to the 1,4-a-D-glucan moiety of cellulose. 
The glocanases (Keitel et aL, 1993), also known as liehenases, 
represent a distinct family of glucanohydrolases, that primarily 
hydrolyse the disordered amorphous regions of cellulose, 
cutting at internal glycosidic bonds. The two proteins together 
act in synergy to achieve complete hydrolysis of cellulose. 
Although there are several differences in the two structures, 
especially in terms of insertions and the number of loops, bom 
appear to bind carbohydrate moieties on the concave side of 
the torn sheet, involving interactions of residues structurally 
related to those in legume lectins. In the case of cellobio- 
bydrolase, the extra loop regions and insertions serve to form 
a tunnel to encompass the product cellobiose (Divne et aL, 
1994). Xylanasca, another class of endoglycosidases, are 
structurally very similar to glucanases and have their binding 
sites situated in a cleft on the concave side of the front sheer. 
In this class of proteins, clear conformational changes have 
also been observed upon ligand binding (Havaukaincn ct aL, 
1990). These changes involve (he extra insertions present in 
the front sheet and also the loop regions located close by. 

TWo lectin-like domains have been observed on either side of 
die central sfe)»da™ domain of Vibrio cholera* neuraminidase 
(Qennel et aL, 1994). Ncuraroinidase cleaves the glycosidic 
linkage between a terminal sialic acid and the penultimate 
sugar in various glycocoujugates- The environment of the 
small intestine requires the bacteria to secrete several adhesins. 
It is expected that the lectin-like domains mediate the protein's 
attachment to the adhesins. The ability to recognize carbo- 
hydrates by these domains, however, remains to be proven. 

LNS domains are present in diverse proteins such as 
latninina, neurexins. agrinS and slit (Rtfdenko ex aL, 1999). 
The structures of the 0 domain of laminin A (Hohcncster 
at aL, 1999), Steroid binding protein (Grishkovskaya et at, 
2000) and neurexins (Rudenko et al, 1999) have recendy been 
determined and found to be extremely simitar to each other. 
Neurexins are brafrspeciflc cell surface proteins that are 
believed to be involved in neuron-neuron recognition and 
neumn-neuron adhesion. The crystal structure of the LNS 
domains in them responsible for this fraction, determined 
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recently, reveal ^sandwich motifs with striking similarity to 
legume lectins. The LNS domains in agrin and larniflin A 
are known to bind to heparin and other glycosaminoglycan 
components. The ligand-omding sites in these Amiains, how- 
ever, vary from thai in legume lectins and also m the different 
members containing this doxnauL This is not surprising given 
the broad range of ligands that these domains can bind, lite 
structural homology between neuiexin-lp and lectins raises 
the possibility that LNS domains may have a general function as 
carbohydrate-binding modules and that in neurexins, protein- 
carbohydrate Interactions might contribute to their celt adhesive 
properties at neuronal junctions (Rudenko et aL, 1999). While 
it remains to be investigated whether LNS Domains in neurexins 
do indeed bind sugars, their interactions with protein ligands 
such as oMatrotoxin and neuroligins are well characterized 

Tetanus neurotoxin (TeNT) is the sole causal agent of the 
pathological condition known as tetanus (Urnland et aL, 1997). 
TeNT is a member of the clostridial neurotoxin family, the 
most potent toxins known. The boiulinum toxin family is 
closely related to this. The extraordinary toxicity arises from 
two factors: the first is the critical importance of VAMP/ 
synaptobrevin, the toxins' substrate to rtturoexocytosia, and 
the second is the exquisite transport mechanism exploited by 
the toxin for delivery to its cytosolic target within the central 
nervous system. The receptor binding subunit of TeNT plays 
a dominant role in mis delivery process, perhaps through its 
ability to bind carbohydrates. The structure of the neurotoxin 
completed to galactose has been determined recently (Emsley 
et aL, 2000). The characteristic structural features of this 
protein, such as the topology, concavity of the front sheer and 
the presence of the binding site loops, is remarkably similar 
to that of legume lectins. The crystal structures of several 
complexes determined by Emsley et al indicate that tetanus 
toxin has multiple carbohydrate-binding sites, the site on the 
domain adopting the (^trefoil fold being the best established. 
In addition, the P-sandwich domain also has several lesidues 
(particularly Tyr9Q9, Glu932, Aspl067 and Asni0fi 9) situa ted 
on the concave surface of the front sheet that appear appropriate 
to bind a carbohydrate molecule, analogous to that in some of 
the other structures discussed here. Emsley et aL also discuss 
the possibility of subsite multivalency in these proteins similar 
to that in lectins. Whether both domains are involved in 
carbohydrate recognition and, if they are, whether they are 
involved together or separately, however, still remain to be 
explored, 

The crystal structure of the activated 65 kDa lcpidopterarj- 
specific CrylA and CrylllA toxins from Bacillus thurin$iensis, 
belonging to a large protein of cry proteins, reveals a domain 
(domain HO containing a p-sandwich structure made of two 
twisted anti parallel p-sheets forming a face-to- face sandwich 
(Grochulski et al, 1995). These toxins, also known as insect- 
icidal crystal proteins, are synthesized mtracellulariy as inactive 
prototoxins and, when activated in the gut juice, bind to high- 
affinity sites of the midgut epithelial cells. Analysis of the 
structural fold of this domain indicates it to be a variant of 
the jelly roll fold. The minimal determining region of the 
legume lectuvlike fold appears intact in these proteins also, 
although it is significantly different from legume lectins 
because of differences in topology, a smaller number of strands 
in each sheet and loss of concavity on the front sheet and loss 
of equivalent binding site loops. The Involvement of this 
domain in receptor recognition has been suggested (Grochulski 
et aL, 1995), although its exact role remains to be identified. 
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Spormadbffltns art a family of conserved proteins known 
bo be important in garotte recognition during feitiiizaaon 
{Romaro ct al. t 1997). Several members of this family have 
been studied of which the crystal structures of seminal pla&ma 
proteins 1 and 2 (PSP-I and PSP-H) (Varela et at, 1997) end 
the acidic seminal pla&ma proicifl (a5PP) (Roroao ctaL. 1997) 
are available. Although all membere of the spennftdheain 
family share 60-98% amino add sequence, they are not 
functionally equivalent For example, the porcine spennadhesin 
AWN and its equine homologuo HSP-7 display carbohydrate- 
binding activity through which they bind tighfly to the sperm 
bead membrane, whereas the bovine aSFP does not show 
caibohydraie-binolng activity but is thought to stimulate pro- 
gesterone secretion by granulosa cells. All three structures, 
however, reveal a CUB domain architecture which is a variant 
of the jelly roll motif. The three proteins have been shown to 
superimpose well among themselves with r.m.s.ds below 1 A 
(Romero t± aL. 1997), the putative carbohydrate-binding site 
in PSP-fl Is suggested to be located at a shallow groove on 
the protein surface, similar in position to that in legume lectins 
and gaJectins (Varela al, 1997). ThiB is despite the loss of 
concavity and shortening of loops in the spennadhesinB. 

Ccnclusiorw 

The comparison of 15 different families described here suggests 
that their overall BUVCtures have some characteristic features 
common to all members. The topology is remarkably conserved 
in all the members, suggesting it to be a strict prerequisite for 
the fold. The variations that occur in spcrmadhesin and the 
insectiddal toxin do not change the nature of this region. The 
study also highlights the fact that the fold is compatible with 
different quaternary structures that are formed in different 
famines or in different members within a family. The fold 
appears to tolerate a wkle variation in the loops, especially 
those corresponding to those in the carbohydrate binding site 
in legume lectins. The predominant function of proteins of 
these 15 families appears to be carbohydrate binding through 
which they mediate higher Older biological events, although 
in some cases the exact nature and specificity of the sugar are 
yet to be dctennined. Variations in the binding site and the 
loop lengths, observed in some cases, probably tailor the 
different proteins for differences in ligand specificities that are 
required to perform a wide range of functions such as those 
described here and surely many more yet to be determined. 
The comparative analysis presented here clearly identifies the 
minimal deterrmning region in the fold and how it provides a 
common scaffold over which local structural variations can be 
rendered to achieve flexibility and adaptability required for 
recognizing diverse carbohydrates. Recognition of such scaf- 
folds in every fold can be useful for automating structural 
classifications in the future, a need that will be increasingly 
on the rise as the structural genomics projects begin to make 
headway. More importantly, these scaffolds provide discrete 
templates for use in fold recognition of a new protein, another 
acute need that has arisen out of the sequencing of several 
genomes. 
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Erythrina crutagalli lectin (ECL) is a galactose-specifk 
legume lectin. Although its biological function in the legume 
is unknown, ECL exhibits hemagglutiuating activity at vitro 
and Is mitogenic for T lymphocytes. In addition, it has 
been recently shown that ECL forms a novel conjugate *hen 
coupled to a catalytically active derivative of the type 
A neurotoxin from Clostridium hotutinum, thus providing 
a therapeutic potential. ECL is biologically active as 
a dimer in which each protomer contains a functional 
carbohydrate-combining site, The crystal structure of native 
ECL was recently reported in complex with lactose and 2'- 
fucosyllactose. ECL protomcrs adopt the legume lectin fold 
but form non-canonical dimers via the handshake motif as was 
previously observed for Erythrina cortdlodendron lectin. Here 
we report the crystal structures of native and recombinant 
forms of the lectin in three new crystal forms, both unligandcd 
and in complex with lactose. For the first time, the detailed 
structure of the glycosylated hcxasaccharide for native ECL 
has been elucidated. The structure also shows that in the 
crystal lattice the glycosylation site and the carbohydrate 
binding site arc involved in intermolecular contacts through 
water-mediated interactions. 

Key words: crystal stmcture/Ery/AWwa crUtagalH lectin/ 
Erythrina corallodendron lectinW-glycosylationV 
protein-carbohydrate interaction 



Introduction 

Erythrina cristagalli lectin (ECL) is a galactose-sperific 
legume lectin. Although its function in the legume is 
unknown, in vitro ECL has been shown to have hemag- 
ghitiaating activity and to be mitogenic for human T 
lymphocytes (lglesias et qL Recently, a conjugate 

comprising a catalytically active derivative of Clostridium 
botulinum neurotoxin A coupled to ECL has been used to 
selectively target nociceptive afferents (Duggan ei al.. 2002). 
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The specificity of ECL in retargeting the potent endopepii- 
dase activity of botulinum neurotoxin A to nociceptive 
afferents in vitro points to the potential therapeutic use of 
this conjugate in the treatment of chronic pain. 

ECL has been well studied in terms of carbohydrate 
binding. It interacts more strongly with fucosyllactose and 
fucosyllactosamine (Moreno ei al., 1997; Surolia et al, 
1996; Tcnebcrg et al, 1994) than with AT-acelyllactosamine, 
lactose, AT-acciylgalactosamine and galactose (lglesias et al, 
1982). ECL differs subtly from the other Erythrina lectins in 
that it has a similar affinity for fucosyllactose and fucosyl- 
lactosamine, whereas other members of the family exhibit 
a preference for fucosyUactose (Moreno et al, 1997). The 
crystal structure of native ECL (wECL) has recently been 
determined in complex with lactose and fucosyllactose, 
providing insight into its altered carbohydrate specificity 
(Svensson et al m 2002). 

The wECL protomer adopts a jelly-roll topology, in 
common with other legume lectins. Each protomer 
contains one Ca 2+ and one Mn 2+ ion, both of which are 
required for carbohydrate binding activity (Emmerich et al, 
1994). These ions are situated close to the carbohydrate- 
binding site (combining site) and help maintain the correct 
spatial orientation of combining site residues (Derewenda 
et a/., 1989). A c/j-peptide bond beiween residues Ala88 and 
Asn89 holds the side chain of Asn89 in the correct orienta- 
tion for carbohydrate binding (Svensson et al, 2002). 

rtECL has been shown to be glycosylated at Asnl7 and 
Asnl 13, with partial occupancy ai Asnl 13 and the nature of 
the bound heptasaccharide lias been characterized (Ashford 
et al, 1991). The major component of the carbohydrates 
bound to ECL contains two AT-acetylglucosamine (GlcNAc) 
residues, one fucose, one xylose, and three mannose residues. 
Native ECL exists as a dimer in which two protomers associ- 
ate back-lo-back, forming a handshake motif. This noncano- 
nical mode of diinerization was first observed in Erythrina 
coralbderulron lectin (ECorL) (Shaanan et al, 1991), which 
shares 96% sequence identity with >iECL. In ECorL, the 
ordered heptasaccharide bound to Asnl 7 is believed to pre- 
vent formation of the canonical dimer (Shaanan ei al, 1991) 
and it has been suggested that this may explain why wBCL 
dimers also adopt the handshake motif (Svensson et al, 2002). 
However, the reported crystal structure of wECL did not 
provide the structural details of the bound heptasaccharide. 

Despite the importance of oligosaccharide interactions in 
glycoproteins, only little structural knowledge (using X-ray 
crystallography) has been gained over the years. This is 
mainly due to the inherent mobility and chemical hetero- 
geneity of the oligosaccharides that prevent crystallisation. 
However, in a few cases (such as ECorL) this has 
been achieved (Shaanan et al, 1991). Here we report the 
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crystallization and structures of native and recombinant 
ECL (nECL and recECL, respectively) in three new crystal 
Forms. From the wECt structure, wc were able to glean the 
detailed picture of most of the bound oligosaccharide on 
Asnll3 (a hexasaccharide portion of the heptasacermride). 
The protomers of kECL adopt the legume lectin fold and 
demonstrate structural equivalence of the recombinant 
form despite its lack of glycosylation. We examined the 
structural effects of glycosylation on quaternary structure 
by comparing the dialers formed by native and recombinant 
ECL. We confirm that dimers of boih recombinant and 
native ECL adopt the handshake motif and suggest that 
structural factors other than the presence of glycosylation 
induce the protein to form a noncanonical dimer. Further- 
more, we have studied the mode oflactose binding, which is 
similar to that observed for ECorL. 



Results and discussion 

nECL 

nECL cocrystallized with lactose in space group P6 5 , with 
two molecules per crystallographic asymmetric unit and 
67% of the crystal volume occupied by solvent. The final 
model (dimer) at 2.0 A resolution (R Qrysx 18.97%, R lVcc 
20.92%) contains 477 amino acids, 2 calcium and manga- 
nese ions, 2 lactose molecules. 4 GlcN Ac residues, 2 fucosc 
residues, 1 xylose, 2 mannose residues, 4 HEPES molecules, 
and 456 waLer molecules, (Table l). The estimated Luzzati 
coordinate error is 0.22 A. Analysis of the Ramachandran 
plot revealed ihat the side chains of more than 99% of the 
amino acids have allowed Or additional allowed conforma- 
tions. The side chains of Tyrl06 in each molecule adopt 
generously allowed conformations. The first three residues 
of the heptasaccharide bound to Asnl 13 were modeled into 
the electron density in one of the two protein molecules in 
the asymmetric unit whereas electron density for six of the 
seven sugar residues w«us observed in the other. 

recECL 

recECL crystallized in triclinic form, with four molecules per 
unit cell (49% of the crystal volume occupied by the solvent). 
The final model at 2.13 A resolution (Rwyst 19.1%, R fltc 
24.1%) contains 958 amino acids (molecule A, 1-239; mole- 
cule B, 1-239; molecule C, 1-240 and molecule D, 1-240), 
4 calcium and manganese tons, 3 glycerol molecules, and 
879 water molecules (Table 1). The Luzzati coordinate error 
is 0.23 A, and the average B factor for protein atoms is 
24.52 A 2 . \F 0 \ - |F C | electron density in the combining site 
of three protomers was identified as glycerol, which had 
been included in the cryoprotectant solution used during 
data collection. A single residue, Tyr106, in only one of the 
four protomers is located in a generously allowed region of 
the Ramachandran plot. Possible alternative conformations 
were observed in the electron density map for the side chains 
of Serl20, but not in all of the molecules in the unit cell. 

recECL was also cocrystallized with lactose in space 
group P2i, with four molecules, per asymmetric unit (49% 
solvent). The final model at 1.7 A resolution (R LT yit 17.79%, 
20.31%) contains 959 amino acids, 4 lactose molecules, 

924 



Tabk 1. CryMallographic data processing and refinement statistics 







recECL 


/ttcECL 


Dam processing 








Spjcti group 




Pl 




Cell dimension? (Al 


n» (34.02 


a =y 55.28 


a™ 54.90 




b- 134,03 


b^' 55.37 


b = 167.23 




c = 8l.64 


C = B6.93 


c = 55.U 




a = 0 = 9O° 


a = 86.23' 


a=Y--90 o 






P = 75.37° 
Y=82.]3* 


P 97.09° 


Resolution (A) 


2.00 


2.13 


1.70 


Reflections measured 


414,754 


399.099 


1,021,770 


Unique reflections 


56.451 


55,200 


10M95 


Completeness (y 0 ) 




93.1 


94.1 


In inc outermost shell 


99.9 


86.1 


77.3 


!><<*!) 


18.44 


10.93 


28.05 


In the omcrma&i shell 


6.22 


5.29 


5.26 




0.070 


0.077 


0.05S 


Refinement 












19.10 


17.79 




20.92 


24.10 


20.31 


Average B factor (A 1 ) 


28.&4 


24.52 


19.99 


Wilson 


23.1 


25.60 


19.20 


RM&D from ideality 








Bonds (A) 


0.005 


0-009 


0.006 


Bond angle* (cleg) 


1.44 


1.47 


t.49 


Dihedrals (deg) 


25.32 


26.15 


26.10 


Impropcrs (deg) 


0.78 


0.91 


0.7$ 



K«mm ^wxMrtO- </>{^0|/iW/UAA0 whore </> is the averaged 

intensity of the t observation* of reflection A*/. 

R^v, - £|F 0 | - |F c |/S|Fol where F 0 and F c are the observed and 

calculated structure factor amplitudes, respectively. 

R 1|W ii equal to R Cfyrt Tor a randomly selected 5% subset of reflections not 

used in the refinement. 



4 calcium and manganese ions, and 1119 water molecules 
(Table I). The Luzzati coordinate error is 0,20 A, and the 
average B factor for protein atoms is 19.99 A*. Five amino 
acid residues are located in generously allowed regions of 
the Ramachandran plot — Tyrl06 from each protomer and 
Asp221 from one of the four protein molecules. Several side 
chains were observed to have potential alternative confor- 
mations (Met95» Aspl61, Leul80, and His234) but not in all 
of the protomers. 

Overall structure 

Protomers of ECL adopt the conserved jelly-roll fold that is 
cliaracteriiiiic of legume lectins (Figure 1). This structural 
motif comprises a sw-stranded back (J-sheet, a curved 
seven-stranded front P-sheei, a short five-membered 
jj-sheet> and a set of loops connecting the three sheets. 
Both the six- and seven-straoded p-sheets are entirely 
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Fte. 1. The suueture of ECL dimer. Protomers of ECL associate together back-to-back to form noncnnomwl dimars yia the hand^ake moiifaj was 
first observed for ECorL. The dimer structur* is «tiUIM by two hydrogen bonds and a jerks of contacts between side chains in fo^ < ran^ of the 
flat, six-stranded U-shcet. The pnuomcrs are tilted wUh respect to one another such that the N- tmd C-term.m play no part m the dimcr Interface. Each 
protomer adopts The conserved jelly-voll fold characteristic of legume lectins. The AMinked carbohydrate (on Asnl 13)>cioje bound at the 
iombining site and WEPES molecules (from the crysiallUation medium) m shown. The manganese and calcium ions bound in the vicinity of the 



combining 

combined site arc shown as. small spheres. 



amiparallel. The N- and C-termini associate together, 
forming the first two strands of the six-stranded fr-sheet. 

The crystal structures of nECL and recECL at 2.0 A and 
2,13 A resolution, respectively, superimpose with an overall 
root mean squared deviation of 0.26 A (Cot atoms). Com- 
parison of the two structural models indicated that there arc 
no significant differences between the crystal structures of 
nECL and recBCU thus confirming that recECL adopts 
native-like structure. The main difference between the 
native and recombinant forms is the absence of AMinkcd 
carbohydrate in recECL. 

Each ECL protomer contains one Ca" and one Mn 
ion, both located close to the carbohydrate combining site. 
The metal ions arc approximately 4.2 A apart, and each 
coordinate two water molecules and make contacts with 
four amino acids in the vicinity of the combining site. One 
of these structural waters also stabilizes the conserved 
c/j-pepiide bond (Ala88-Asp89) that correcdy orients the 
side chain of Asp89 for carbohydrate binding. Mn 21 ' makes 
contacts with the side chains of Glul27, Aspl29, Aspl36, 
and Hisl42, and Ca 21 * makes contacts with Aspl29, Phel31, 
Asnl33, and Aspl36 (Table II). In addition to binding the 
calcium ion, the side chains of Phel31 and Asnl33 are also 
implicated in lactose binding (see following discussion). 

ECL exists as a biologically active dimer and has not been 
isolated in monomeric form. Both nECL and recECL form 
noncanonical back-to-back dimers via the handshake motif 
in which the two protomers are tilted with respect to each 
other (Figure I). This mode of dimerization has previously 
been reported for ECorL and the basic winged bean lectin 
(Prabu et al> 1998; Shaanan e( aA, 1991). The dimer 
interface exists between the six-stranded sheets of each 



protomer and is stabilized by two hydrogen bonds between 
the side chain of Lysl 71 in one protomer and Thrl93 in the 
other and a number of van der Waals contacts between 
residues in four strands of the six-stranded (J-sheet — the 
strands formed by the N- and C-termini do not cake part 
in stabilizing the quartemary structure, Residues Arg73, 
Glu79, Gln80, Pro8l, Tyr82, Thr83, ArgS4, Lysl 16, 
Glnil7, Aspll8, Asnll9, Asnl4&. Aspl6l, A$nl62, 
GUU64, Lysl 71, Ilel91, Thrl93, Gln202, Val203, and 
Asp22l mediate these contacts at the dimer interface. 

Binding of N-linked glycosylated saccharide 
The JV-Hnked oligosaccharide bound to nECL is covalently 
bound to Asnl 13, which is part of the loop structure located 
between strands (15 and |36. This residue is the only possible 
site of attachment for W-liukcd glycosylation in the 241- 
amino-acid sequence of ECL. In one of the two protein 
molecules (molecule B) in the asymmetric unit, there was 
enough electron density to model six of the seven sugar 
residues bound to Asnll3 (Figure 2). The modeled 
hexasaccharide has the profile: ccD-Man-(l -» 3HM}-Xyl- 
(1 -2)]-p-D-Man-^D-GlcNAc-(l ->4)-[ct-L-Fuc (1—3)]- 
D-GlcNAc and does not make contacts with any other 
pans of the lectin protomer to which it is bound, although 
it might communicate with the lactose moiety bound in the 
combining site of a symmetry-related molecule through 
water mediated interactions. 

Influence of glycosylation on the quartemary structure 
ECL shares 96% sequence identity with ECorL, which was 
thought to dimerize in a noncanonical fashion because of 
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Tabte n. Contacts between ECL and meial ions 



Metal ion 


Amino acid side chain 


* 

Distance (A) 


Ca 2 - 


Asp 136 OD2 


t Art 




Aspl29 OD2 


2. Si 




Asp 129 OD1 


2.60 




Asnltf ODI 


2.45 




Wntcr 


2.54 




Waicr 


2,55 




Asp)29CG 


2.9 




Aspt29 ODI 


2.6 




Aitpl29 OD2 


2.5 




PhclJl O 


2,5 




Asnl33 ODI 


2i5 


Mn a+ 


Hisi42 NE2 


2.39 




Glu 127 0£2 


2.22 




At>pl29 OD2 


2.23 




Asp 136 001 


2,16 




Water 


2.33 




Water 


3.96 




Water 


2.21 




Glul27 0E2 


2.2 




Aspl29CG 


3.2 




A$pl29 OD2 


2J2 




Aspl36 CG 


3.1 




Asp 136 ODI 


2.2 




His342 NB2 


2.4 




HUU2CE! 


3-2 



Contacts formed between ECL and the calcium and manganese ion* 
bound dose to the carbohydrate combining siu. Bach of the metal ions is 
bound close to the liCL combining site through interactions with four 
amino acid side chain* and two wawr molecules, one of which also 
stabilises the conserved cfe-pepiide bond- 



the glycosylation attached to Asnl7. Based on their high 
.sequence identity, one would expect ECL to dimerize in the 
same way as ECorL, but the form of ECL studied in this 
investigation lacks the ^-linked glycosylation site at posi- 
tion 17 (which is located at the opposite end of the molecule 
compared with Asnll3 in ECL). This raises an interesting 
issue regarding its mode of dimerization: If there is no 
heptasaccharide bound to residue 17, what might force 
ECL Lo form a noncanonical diraer? Several studies have 
been undertaken to rationalize the various oligomeriza- 
tion states of legume lectins in terms of their amino acid 
sequences, shape complementarity of protomers, interac- 
tion energy between protomers, and hydrophobic surface 
area buried on oiigomerisation (Elgavish and Shaanan, 
2001; Manoj and Suguna, 2001; Prabu et aL, 1999; Srinivas 
et a/., 2001). The results of these studies indicate that the 
observed modes of oUgomerization are energetically more 
favorable than any alternative quartcrnary structures. 

We confirm that both native and recombinant forms of 
ECL associate into dimers back-to-back via the handshake 
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motif. Because tec ECL is unglycosylated, this demonstrates 
that the presence of TV-linked glycosylation does not influ- 
ence the mode of dimerization in ECL and suggests that 
factors intrinsic to the primary structure of the lectin dictate 
its quartcrnary structure. Examination of legume lectin 
sequences, focusing on the regions forming interfaces 
between protomers and dimers, lias identified amino acid 
residues that might influence the mode of oUgomerization 
(Manoj and Suguna, 2001). Extrapolation of these results to 
ECL reveals that the primary structure of this lectin con- 
tains features (Glu2, Glul2, Lys55, Arg73, and Lys171) that 
indicate it could be expected to form ECorL-type dimers. 
Lectins that do not form canonical dimers have an acidic 
residue at the position corresponding to Lys55 in ECL and a 
charged residue equivalent to Glul2 that comes into close 
contact with a charged residue (Glu2) that would form 
unfavorable interactions if a concanavalin A-type dimer 
were formed (Manoj and Suguna, 2001). Arginine and lysine 
residues at positions equivalent to 73 and 171 in ECL 
are conserved among lectins forming ECorL-type dimers. 

Lactose binding 

The combining site is located in a shallow cleft on the sur- 
face of ECL and accommodates the galactose moiety of 
bound carbohydrates. The crystal structures of the nECL- 
lactose and wECL-lactose complexes superimpose with an 
overall root mean squared deviation of 0.27 A . The spatial 
arrangement of residues in the combining sites of both 
forms of the lectin arc identical, confirming that reeECL 
binds lactose in the same way as nECL. Both wECL and 
reeECL bind lactose through a set of structural water mole- 
cules. These water molecules mediate indirect hydrogen 
bonds between Glyl07> Asnl33, Ala218, and Gln219 and 
the 02, 03, and 06 of galactose and 02 of glucose. Lactose 
also makes contacts with more water molecules in the com- 
bining site (Figure 3). Hydrophobic stacking interactions 
were observed between the aromatic ring of Phel31 ( the 
galactose ring of the bound lactose and the side chain of 
Tyrl06. forming a sandwich with the galactose ring between 
the aromauc side chains. The side chain of Tyrl06 adopts 
a generously allowed configuration in the lactose-bound 
structures, as determined by the Ramachandran plot. In 
the unliganded reeECL structure, TyrlOfi in only one of 
the four protein molecules was located in an additional 
allowed region of the plot. Thus it appears that the con- 
formation of Tyrl06 is affected by carbohydrate binding. 

The mode of lactose binding in ECL is similar to that 
observed for ECorL (Shaanan et aL, 1991), although there 
are subtle differences in their carbohydrate specificities 
(Moreno et aL, 1997; Teneberg et al„ 1994). The altered 
carbohydrate specificity of ECL compared to ECorL is 
postulated to be due to differences in their amino acid 
sequences at positions 111 and 125 (Svensson et aL, 
2002)— substitution of residues at these positions causes 
rotation of the side chain of Val92, which is thought to 
induce structural changes in the combining site. Analysis 
of the crystal structure of nECL revealed that although 
there are no contacts between these three residues, Val92 
makes contacts with VaU26, which is located close to the 
combining site. It is possible that substitutions causing 
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Fifc 3. Binding of lactose to ECU Hydrophobic interactions were observed between the side chains of TyHG6 and Phel 3 1 and the galactose ring 
of the laclose molecule. This sandwiches the sugar ring between the aromauc side chain* of ihe iwo amino acid$. Lactose makes a number or contacts 
with side chains of combining site residues, but only the galactose moiciy is accommodated in the shallow cleft of the combining site. There arc no 
direct hydrogen bonds between the leclin and diaaccharide; instead, interactions are mediated by a set orsiruciural water molecules. Key residue? 
involved in carbohydrate binding arc Uu86. Asp89, Glyl07, Asn133, Ala2l8 ? and Gln219. 



movement of Val92 might indirectly affect carbohydrate 
binding. However, the real effect of Val92 on carbohydrate 
binding remains to be fully analyzed. 

The structures of rccECL unliganded and in complex with 
lactose (at 2.13 A and 1.70 A resolution respectively) 
superimpose with an overall root mean squared deviation 
of 0.21 A . A direct comparison of side chain positions in the 
combining sites of the two structures revealed that there are 



no structural rearrangements on lactose binding. This indi- 
cates that the amino acid residues involved in carbohydrate 
recognition are optimally oriented in the ECL protomer. 

Comparison with previously reported structure ofnECL 
in the presence of lactose 

As already mentioned, the crystal structures of /iECL in 
complex with lactose and 2'-fuco$yllactose were previously 
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reported (Svensson ct aL 2002). Comparison of the amino 
acid sequence of wECL used in this investigation with that 
reported for native ECL (hereafter referred to as Svcn ECL) 
revealed 10 differences: Asnl6Asp t Aspl7Asn, lle25Leu, 
lle59Met t Met62Ser, Serl75Pro, Leul80His, Alal81Val, 
Glu206Asp, and His234GIn (where the first residue is that 
in /lECL and the second is that in Sv?wECL). Tlie differences 
at positions 25» 181, and 206 represent conservative sub- 
stitutions. The residues at positions 16 and 17 appear to be 
swapped in rtECL compared with SvenECU but because the 
side chains are of the same size and shape, these amino acids 
cannot be distinguished from one another on the basis of 
their electron density. The sequence of recECL was deter- 
mined by DNA sequencing of multiple clones from two 
independent sources of £ cristagalli (Stancombo et aL 
2002) and analysis of the electron density maps confirmed 
this sequence at ail 10 locations where it differs from 
SvenECL (obtained by mass spectrometric peptide 
mapping, tandem mass spectrometry, and X-ray structure; 
Svensson et aL 2002). However, it is important to note that 
none of the sequence differences affect amino acids that are 
involved in carbohydrate binding or stabilizing the dimer 
interface. The main difference is the absence of a potential 
^-linked glycosylation site at residue 17 from wECL. The 
electron density maps for nECL were carefully inspected to 
ensure that there was no evidence of residue 17 being 
asparagine instead of aspartic acid, but no density was 
observed that might represent bound carbohydrate. 
Therefore we are confident that the sequence of wECL 
matches that of recECL* 

Conclusions 

In this investigation nECL was crystallized in a new crystal 
form in complex with lactose and recECL was crystallized 
for the first time in two different crystal forms, both 
uniiganded and in complex with lactose. We confirm that 
the tertiary structure of both forms of ECL is homologous 
to the known structures of other legume lectins, with pro- 
tomers adopting the jelly-roll (legume lectin) fold. We have 
confirmed that wECL protomers associate back-to-back 
via the handshake motif, and we have shown that recECL 
protomers dimerize in the same way. Following comparison 
of native and recombinant forms of the lectin, it can be 
concluded that the presence of bound oligosaccharide does 
not influence the tertiary or quarternary structure of ECJL- 
Furthermore, comparison of the structures of nECL and 
recECL in complex with lactose confirms that the presence 
of ^-linked glycosylation on nECL has no major effect on 
the structure of the combining site or carbohydrate binding. 
Thus we confirm that recECL is native-like in terms of both 
its structure and biological activity. 

Materials and methods 

Protein purification and crystallization 
nECLwas purchased from Sigma (Dorse u UK). recECL wa* 
expressed and purified as previously reported (Stancombc 
et aL 2003). Briefly, E. cristagalli seeds (Sandcman Seeds, 
Oxford Botanical Gardens, UK) were germinated and 
genomic DNA, obtained from leaf material, was amplified 
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by polymerase chain reaction and cloned into vector 
pMTL10l5. For expression, this clone was transformed 
into Escherichia coll BL21 (DE3) cells and cultured at 8-L 
fermentation scale at 37°C until static growth (ODax) = 30). 
The protein was solubilized from inclusion bodies and sub- 
sequently refolded and purified on an immobilized lactose 
matrix. After extensive washing, recECL was eluted by the 
addition of 0-3 M lactose. 

Crystallization was achieved using the vapor diffusion 
method (hanging drops) at 16°C, and crystals were 
observed within 3-4 weeks. For crystallization of nECL in 
space group P6$, drops made up of 2 ml protein solution, 
2 ml mother liquor (70% 2-methyl-2,4-pentanediol and 0.1 
M HEPES, pH 7.0) and 0.4 ml 100 mM lactose solution and 
were equilibrated over scaled wells containing 800 ml 
mother liquor, recE/CL crystallized in space group PI from 
drops containing 2 ml protein and 2 ml mother liquor (17% 
polyethylene glycol [PEG] 3350, 0.3 M sodium chloride, 
and 0.02 M imidazole), recECL was also cocrystallized 
with lactose using mother liquor made up of 20% PEG 
3350 and 0.2 M imidazole. 

X-ray data collection and structure determination 
X-ray diffraction data were collected at 100 K using the 
Synchrotron radiation Source at Daresbury, UK, on station 
PX14.1 (wavelength 1.483 A). Raw data images for nECL 
were indexed and integrated using DENZO (Otwinowski 
and Minor, 1997) and then scaled with SCALEPACK 
(Otwinowski and Minor 1997). X-ray images collected for 
the recECL crystals were processed and scaled using 
HKL2000 (Otwinowski and Minor, 1997) (Table I). 

Initial phases were obtained by the molecular replacement 
method using the program AMoRe (Navaza,J994). For 
nECL. the crystal structure of ECorL at 1.95 A resolution 
(PDB code 1 AX1) (Elgavish and Shaanan, 1998) was used 
as an initial model with the nine nonconscrved residues 
mutated to alanine, and all waters, Uganda, and metal ions 
removed. Refinement was performed using the CNS suite of 
programs (Brunger et at., 1998). After one round of refine- 
ment, the crystallographic R factor (R cryn ) dropped to 
29.59% (Raw 32.38%, based on 5% of reflections omitted 
from the refinement). Calculated phases from the refined 
structure were used to determine |F 0 j - |FJ and 2|F 0 | - FJ 
electron density maps. Careful examination of the |F 0 | - F c | 
map allowed mutation of the nonconserved residues back to 
their native side chains and the addition of calcium and 
manganese ions to the model. HEPES and lactose were 
built into electron density of each protomer, with parameter 
and topology files from the HlC-Up server (Kleywegt and 
Jones, 1998). Similarly glycosylated sugars (three in mole- 
cule A and six in molecule B, part of the heptasaccharide 
bound to Asnl 13) were modeled imo the |F 0 | - |F C | electron 
density map, riding on the Asnl 13 residue. 

Repeated rounds of refinement and model building were 
carried out to improve the model. After several rounds of 
refinement, water molecules were added to the structure if 
there were peaks in the | F Q | - 1 F t \ electron density maps with 
heights greater than 32 at hydrogen bond forming 
distances from the appropriate atoms. 2|F 0 | - |F C | maps 
were also used to check the consistency in peaks. Water 
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molecuJes with a temperature factor of >60 A 2 were 
excluded from subsequent refinement steps. 

Subsequent solution of recECL structures was achieved 
using the refined coordinates of nECL (at 2.0 A resolution) 
as a search modeL Model building and refinement 
procedures were undertaken as described for /i£CL. For 
recECL in space group Pi, apart from the lactose 
molecule, glycerol was modeled into the carbohydrate- 
combining site. 

The program PROCHECK (Laskowski et aL 1993) was 
used to assess the quality of each structure after the final 
round of refinement. Analysis of the Ramachandran plot 
for each structure revealed that over 99% of residues were 
located in allowed regions of the plot. The refinement 
statistics for all structures are listed in Table I. 
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1. INTRODUCTION 

Erythrina is a family of deciduous leguminous trees 
and shrubs widely spread in the tropics and subtropics. 
Since 1980, lectins from some 20 species of this family 
have been isolated in different laboratories, a dozen of 
which by us (1,2]. All Erythrina lectins studied are 
specific for galactose and //-acetylgalactosamine and 
show pronounced preference for jV-acetyllactos amine, 
They are glycoproteins (3-lOVo carbohydrate) of 
molecular masses in the range of 56000-68000 Da and 
are composed of two identical or nearly identical 
subunto. Whenever examined, the carbohydrate of the 
lectins was found to consist of glucosamine, manno&e, 
L-fucose and xylose, predominantly in the form of the 
asparagine-linked heptasaccharide 
Man^3(Maiia6XXyVK)Man^4Glc^Acv?^ 
(Fuc*3)G1gNAc (31* The N-terminal amino acid se- 
quences, determined on 10 Erythrina lectins for up to 
IS amino acids have been nearly identical [2]. 
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We have now established the nearly complete se- 
quence (the first 244 amino acids) of Erythrina cor- 
allodendron lectin (ECorL). as well as the glycosylation 
site. Comparison of this sequence with those of 9 other 
legume lectins reveals, as predicted [4], extensive 
homologies. 

2. MATERIALS AND METHODS 

2.1. Preparation of ECorL 

The lectin extracted from the seeds of Erythrina cotaUadendron 
was purified by fractional precipitation with ammonium sulfate and 
afflniiy chromatography on a column of lactose coupled to 
tfvinylsulphone-octivated Sepharose [2). 

2.2. Enzyme digestion 

The following digestions were performed, all at an en- 
zyme/substrate ratio of 1:50. 0) ECorL (2 mg/ml) was heat- 
denatured at pH 2 (91 and digested with lyiylendopep&iase-C 
(Calbiochem) in phosphate buffer, pH 7.8, for 6 h ax 37°c tf). GO 
Samples or ECorL, 10-15 mg in I ml of 0.2 M N^hyimorpholine 
HC1 buffer, pH 8.3. were digested separately with trypsin (TPCK- 
treated, Sigma) and chymotrypsin (o^ymotrypsin, type VII 1. 
TPCK-treated. Sigma), ror 5-10 h at 37»C. (lli) ECorL, 7 mg/ml of 
0.1 M sodium bicarbonate. pH 6.1. was digested with Staphylococ- 
cus aureus V8 protease (Miles) at 31°C Half of the sample was 
removed after 24 h, while the remainder was kept for 5 days ax room 
temperature* <W) The lectin, 10 mg in 2 ml of 0.2 M Ttis-HQ, pH 
8.8. was denatured by booing for 10 min and digested with dastase 
(Sigma) for 75 mln at 37°C. 

2.3. Chemical cleavage 

ECorL* to rag/ml, was cleaved with 75% formic acid In 6 M 
guniridlne hydrochloride for 72 h at 37*C. 

2.4. Peptide separation and purification 

For the separation of the peptides in the fysylendooeptidasc-C 
digest, aliquot* (l0Q*g/wcU) were loaded onto ft-ZSfe gradient 
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polyacryiaoude slab gels containing OAV* 5DS and eleclrophoresed 
as described (7]- The separated peptides were elecuophorctfcally 
transferred to a polyvinyHdene difluoride membrane (PVDF, 
Mfllipore). Strips of the membranes were stained with Cooroussie 
bhifi and bands corresponding to Use peptides were cut out from the 
membrane (8]. 

Peptides in the tryptic, chymolryptie and elaslase digests of ECorL 
were fractionated on columns (1 x 200 cm) of Biogel P-6 in 0.1 M 
ammonium bicarbonate pH M as In (?], while fragment resulting 
from acid hydrolysis were fractionated on a column (I x 200 cm) Of 
Biogd P-30 in 70<* formic add (10]. Eluiion was followed by 
measuring the absorbance at 230 nm. Fractions corresponding to the 
peaks were collected. lyophilized and dissolved in 0.1% 
trifluoroacetic add for further purification by rtvcr*e*phase HPLC. 
The peptides in the 5. aureus V0 digest were purified directly by 
reverse-phase HPLC. 

HFIX was carried out on a Vydac analytical reverse-phase column 
(25 em x 4.6 mm; 218TPJ4 HP Oenenchem, San Francisco. CA) in 
a Waters 60GB muMsolvent delivery system using variable gradients 
of 0-504% acctomtrilc (HPLC grade, Merck) in 0.1 % trichloroacetic 
acid [9,11]. Peptides were detected by measuring the absorbance at 
214 nm, and collected manually directly from the UV monitor. 

2.3. Prejmarhn of tfycopeptkte 

ECorL (500 mg, containing 3* neutral sugar) was denatured and 
digested with 10 mg of pronase (Calbioehem) as described 15]. The 
digest was lyophilized. the dry material dissolved In 7 ml of 0.01 M 
acetic add and insoluble material removed by centrifugation. The 
supernatant was applied in two portions to a Sephadex G-50 column 
a x 160 cm) using 0.01 M acetic add as a solvent. Fractions were 
collected* monitored by absorbance at 230 nm, and examined for 
their neutral sugar content by the phenoI*tuUuric method (12), using 
mannosc a$ standard. Sugar^ntalniog fractions were pooled, 
lyophilized and rechroroalographed oo Ihe same column, under the 
same conditions as described above. 

2.6. Degtycaxytotton 

An aliquot of the rechromatographed material (575 n °f man- 
nose) was lyophilized, dissolved in 100/*! of 20 mM potassium 
phosphate buffer. pH 6.5, 25 mM EDTA and treated with 1 U of en- 
doglycosldase F (Bc«hrhiger-Mannhelm) for 20 h at 37*C. The 
enzyme-treated, as well as a sample of the untreated material, were 
analysed by reverse-phase HPLC using a gradient of 0-7% 
aeetonltrile in 0.1% trifluoroacetic acid- 

2.7. Seqa&iCS determination 

Peptide* derived from the various digest* were subjected to micro- 
sequence analysis using ihe 4-N,/»Mliiricthylainln^ 
isothlocyanatc (DABITQ/pheiiyHsotn^^ (PfTQ double- 
coupling method, followed by thin-layer chromatographic identifica- 
tion of the amino acid derivatives liberated (13.141, The gtyeopeptide 
purified by HPLC, hs endoglycosJdase digestion product and the 
PVDF membranes containing peptides from the lysylexidopeptidase- 
C digest were sequenced on a gas phase Applied Bfosystem automatic 
sequencer model 470 A. 

3. RESULTS AND DISCUSSION 

3 A. Sequence determination 

The sequence of 244 amino adds, together with the 
details of the overlapping peptides and fragments from 
which it was deduced is shown in fig.1. Digestion with 
trypsin and $. aureus VS protease yielded peptides from 
which most of the sequence of ECorL could readily be 
established. Digestions with chymotrypsin, 
lysylendopeptidflse-C and elastase gave good overlaps 
that were helpful in determining the missing residues. 
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Fig.1. Alignment of ECorL and peptides used for sequence 
determination. L, peptides obtained from digestion with 
lysylendopeptidase-C; T, trypric peptides: C, chymotryptlc peptides; 
E and V, peptides obtained by digestion with dastasc and & aurtus 
V8 protease, respectively; H, peptides obtained by hydrolysis with 
dilute formic add m 6 M guanldlne hydrochloride. Solid lines 
indicate regions of peptides sequenced by the xnanual 
DABITC/PtTC method (except for the rysylendopeptidase-C 
peptides which were sequenced by the automatic sequencer). Dashed 
lints indicate residues which were not sequenced or yielded an 
unsatisfactory result. 

Cleavage with dilute formic acid in 6 M guanidine HC1 
yielded 3 main fragments; one of them, resulting from 
hydrolysis of peptide bond Asp" 6 -Pro lJ \ facilitated 
the elucidation of the sequence from amino add 137 
onwards. Each of the residues shown in fig. 1 was iden- 
tified at least twice in the sequence analysis using the 
manual DABITC/PITC double-coupling method or 
the automatic sequencer. 

Microheterogencity was observed in positions 22 
(with Ala replacing GJy) and 199 (with Val replacing 
Asp), suggesting differences in the sequence of the two 
subunits of ECorL in these positions, 

3.2. Glycosylation site 

Digestion of ECorL by pronase, followed by gel 
filtration on Sephadex G-50, afforded a crude 
glycopcptide preparation in good yield (777b based on 
the neutral sugar content of the starting material). 
Fractionation of this preparation on reverse-phase 
HPLC led to the isolation of 5 main (glyco)peptidcs 
(fig,2A). HPLC of the crude glycopeptide after 
deglycosylation by endoglycosidase F (fig.2B), showed 
a single change in the enzyme-treated sample, i.e. a 
shift of peptide no. 2 to a new position (peptide no.6), 
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3.4. Homologies 

Comparison of the amino add sequence of ECorL 
with that of other legume lectins (fig.3) reveals a high 
degree of homology. The amino acids in 39 positions 



are invariant in all the lectins listed. It is notable that 
these highly conserved amino acids include 4 which cor- 
respond to residues previously identified in ConA {171 
as being important in the binding of Ca 2 * and Mn Zf , 
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i.e. Asp l00M \ Aip ww . His 14114 ^, andSer 34 ^ 3 ^. (The 
numbers without parentheses refer to the ConA se- 
quence; those in parentheses are of pro-ConA after 
alignment with ECorL as given in fig.3.) Another metal 
binding residue, Glu 10 * 7 *, is nearly invariant* except 
for peanut agglutinin in which ghitamine was found in- 
stead. Similarly, the amino acids comprising the 
3-dimensional structure of the hydrophobic cavity of 
ConA [17,181 are invariant (Val*^. Phe" l ^ 3 > and 
Phe* 12 ^ or highly conserved (e.g. Ser l,M2JS) ) in 
homologous positions in all the lectins. On the other 
hand, the residues which constitute the monosaccharide 
binding site hi ConA [17] appear to be poorly conserv- 
ed, for instance. Asp 20 ** 99 * is the only residue maintain- 
ed in this site in the other lectins (except for 
leukoagglutinin from phaseolus vulgaris in which it is 
replaced by a valine residue). It is also of interest that 
N-glycosylation triplets (Asn-X-Ser/Thr) are absent 
from some of the lectins; when present, they are located 
at different positions in the primary structure and are 
not always glycosylated. Thus, on £CorL» there is a 
single occupied glycosylation site at Asn 17 ; in PHA-L 
there are two occupied sites at positions 10 and 62 
[18,19] and in soybean agglutinin there are 3 glycosyla- 
tion sites at positions 40. 75 and 1 14, only one of which 
(position 75) is occupied [20]. In favin, the sugar moie- 
ty is located at Asn 17 ' [21], 

The similarity in the primary structure between 
BCorL and other legume lectins listed in fig.3 further 
supports the proposal that all these lectins share a com- 
mon evolutionary origin. Their structures have been 
highly conserved in evolution, presumably for ensuring 
the maintenance of an important physiological func- 
tion(s) yet to be determined. 
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Structural analysis of two crystal forms of lentil lectin at 1.8 
A resolution. 

Lo_ris_R, Vm.OveriijBrgftJOt, Dao-Thi MH, PQQrtnians F, MftfiULfiL 
WynsL. 

Laboratorium voor Ultrastructuur, Vrije Universiteit Brussel, Sint- 
Genesius-Rode, Belgium. 

The structures of two crystal forms of lentil lectin are determined and 
refined at high resolution. Orthorhombic lentil lectin is refined at 1 .80 A 
resolution to an R-factor of 0.184 and monoclinic lentil lectin at 1 .75 A 
resolution to an R-factor of 0. 175, These two structures are compared to 
each other and to the other available legume lectin structures. The 
monosaccharide binding pocket of each lectin monomer contains a 
tightly bound phosphate ion. This phosphate makes hydrogen bonding 
contacts with Asp-81 beta, Gly-99 beta, and Asn-125 beta, three residues 
that are highly conserved in most of the known legume lectin sequences 
and essential for monosaccharide recognition in all legume lectin crystal 
structures described thus far. A detailed analysis of the composition and 
properties of the hydrophobic contact network and hydrophobic nuclei in 
lentil lectin is presented. Contact map calculations reveal that dense 
clusters of nonpolar as well as polar side chains play a major role in 
secondary structure packing. This is illustrated by a large cluster of 24 
mainly hydrophobic amino acids that is responsible for the majority of 
packing interactions between the two beta-sheets. Another series of four 
smaller and less hydrophobic clusters is found to mediate the packing of 
a number of loop structures upon the front sheet. A very dense, but not 
very conserved cluster is found to stabilize the transition metal binding 
site. The highly conserved and invariant nonpolar residues are 
distributed asymmetrically over the protein. 

PMID: 773 1952 [PubMed - indexed for MEDLINE] 
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□ V. Biochem Biophys Res Commun. 2000 Feb 16;268 Related Articles, 

(2):262-7. Links 

SPY i ; A\ 

Erratum in: 

• Biochem Biophys Res Commun 2000 Apr 2;270(1):329. 

Mode of molecular recognition of L-fucose by fucose- 
binding legume lectins, 

Thomas CJ, Suroli*i A. 

Molecular Biophysics Unit, Indian Institute of Science, Bangalore, 560 
012, India. 

Recognition of cell surface carbohydrate moieties by lectins plays a vital 
role in many a biological process. Fucosyated residues are often 
implicated as key recognition markers in many cellular processes. In 
particular, the aspects of molecular recognition of fiicose by fucose- 
bindinglectins UEA 1 and LTA pose a special case because no crystal 
structure of these lectins is available. The study was conducted to 
elucidate the process of recognition of 1-fucose by UEA1 and LTA by 
correlating structure-based sequence alignment and other available 
biochemical/biophysical data. The study points out that the mode of 
recognition of l-fucose is coordinated by the invariant triad of residues 
the asparagine 137, glycine 105, and aspartate 87. The major 
hydrophobic stacking residue in this case is the tyrosine 220. The study 
also reiterates the key role of the conserved triad of residues in the 
combining site which is a common feature for all legume lectins whose 
crystal structures are known. Copyright 2000 Academic Press. 
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Three dimensional structure of the soybean agglutinin 
Gal/GalNAc complexes by homology modeling. 

R&LYS, Lam K, Qasjba PK. 

Structural Glycobiology Section, Laboratory of Experimental and 
Computational Biology, National Cancer Institute, NCI-FCRDC, 
Frederick, Maryland 21702, USA. 

Complexes of soybean agglutinin (SBA) with galactose (Gal) and N- 
acetyl galactosamine (GalNAc) have been modeled based on its 
homology to erythrina corallodendron (EcorL) lectin. The three 
dimensional structure of SBA-Gal modeled with homology techniques 
agrees well with SBA-(beta-LacNAc)2GaI-R complex determined by X- 
ray ciystallographic techniques at the beta-sheet regions and the regions 
where Ca2+ and Mn2+ ions bind However, significant deviations have 
been observed between the modeled and the X-ray structures, 
particularly at the loop regions where the polypeptide chain could not be 
unequivocally traced in the X-ray structure. The hydrogen bonding 
scheme, predicted from the homology model, shows that the invariant 
residues i.e. Asp, Gly, Asn, and aromatic residues (Phe) found in all 
other legume lectins, bind Gal, slightly in a different way than reported 
in X-ray structure of SBA-pentasaccharide complex. The higher binding 
affinity of GalNAc over Gal to SBA is due to additional hydrophobic 
interactions with Tyrl07 rather than a hydrogen bond between N- 
acctamide group of the sugar and the side chain of Asp88 as suggested 
from X-ray crystal structure studies, Our modeling also suggest that the 
variation in the length of the loop D observed among galactose binding 
legume lectins may not have any effect on the binding of sugar at the 
monosaccharide specific site of the lectins, Soybean agglutinin (SBA) is 
a member of the leguminous family of lectins. They generally possess a 
single carbohydrate binding site, besides the tightly bound Ca2+ and 
Mn2+ ions which are required for their carbohydrate binding activity. 
They possess a high degree of sequence homology and about 50% of the 
amino acid residues are invariant Some of these invariant amino acid 
residues are involved in the binding of sugar moieties and in metal ion 
coordination. X-ray ctystaliographic studies showed that their three- 
dimensional structures are very similar, though they differ in their 
carbohydrate binding specificity (1-6). Three of the invariant residues 
Asp, Gly, and Asn, besides an aromatic residue (Phe or Tyr), are 
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involved in carbohydrate binding, Independent of their sugar specificity, 
these four residues in legume lectins provide the basic frame for the 
sugar to bind. 
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Architecture of the sugar binding sites in carbohydrate 
binding proteins—a computer modeling study. 

Rao VS, Lam K, Qasba PK. 

Structural Glycobiology Section, Laboratory of Experimental and 
Computational Biology, National Cancer Institute, NCI-FCRDC, 
Frederick, MD 21702-1201, USA. 

Different sugars, Gal, GalNAc and Man were docked at the 
monosaccharide binding sites of Erythrina corallodenron (EcorL), 
peanut lectin (PNA), Lathyrus ochrus (LOU), and pea lectin (PSL). To 
study the lectin-carbohydrate interactions, in the complexes, the 
hydroxymethyl group in Man and Gal favors, gg and gt conformations 
respectively, and is the dominant recognition determination- The 
monosaccharide binding site in lectins that are specific to Gal/GalNAc is 
wider due to the additional amino acid residues in loop D as compared to 
that in lectins specific to Man/Glc, and affects the hydrogen bonds of the 
sugar involving residues from loop D, but not its orientation in the 
binding site. The invariant amino acid residues Asp from loop A, and 
Asn and an aromatic residue (Phe or Tyr) in loop C provides the basic 
architecture to recognize the common features in C4 epimers. The 
invariant Gly in loop B together with one or two residues in the variable 
region of loop D/A holds the sugar tightfy at both ends. Loss of any one 
of these hydrogen bonds leads to weak interaction. While the subtle 
variations in the sequence and conformation of peptide fragment that 
resulted due to the size and location of gaps present in amino acid 
sequence in the neighborhood of the sugar binding site of loop D/A 
seems to discriminate the binding of sugars which differ at C4 atom 
(galacto and giuco configurations). The variations at loop B are 
important in discriminating Gal and GalNAc binding, The present study 
thus provides a structural basis for the observed specificities of legume 
lectins which uses the same four invariant residues for binding. These 
studies also bring out the information that is important for the 
design/engineering of proteins with the desired carbohydrate specificity. 
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